+1 917-310-0088 iPhone (New York City, USA) | lev.selector (at) gmail.com


●Machine Learning and Artificial Intelligence specialist interested in projects in New York City

●Proven track record of building data and analytics systems from scratch, and delivering them on time

●Ph.D. in mathematical modeling and computer simulations

●15+ years of experience with high-volume data processing and analytics

●Experience using modern ML & AI methods (Random Forest, XGBoost, Logistic Regression, Neural Networks)

●Experience with Financial, Advertising, e-Commerce, Media and Publishing industries

●Full cycle planning and development from business requirements to software architectures & implementation

●Data collection (high-volume web scraping, APIs), Data Engineering (big data loads, ETL), Data Mining

●Building teams - hiring, training, leading

●Experience with Amazon Cloud, Google Cloud, IBM Watson

●Databases (SQL databases and key-value stores like Redis)

●Web applications, and business intelligence & analytics


●Machine Learning & Artificial Intelligence, NLP (Natural Language Processing), Data Science

●Mathematics and Finance: Ph.D. (Math. Modeling), Advanced Calculus, Probability and Statistics, Time Series Analysis, Numerical Methods, Machine Learning, Deep Learning, Data Mining

●Programming & Data Science Tools: Python, Pandas, NumPy, Go (Golang), SQL, ETL, AWS SageMaker, Scikit-Learn, TensorFlow, Hadoop, Perl, Javascript, C/C++, Java, Excel VBA, databases (PostgreSQL, MySQL, Vertica, Netezza, MongoDB, Sybase, DB2, Oracle, MS SQL), Web Apps, Cloud (Amazon AWS, Google, IBM).

●Software Architecture

●Building Teams and Managing Projects


June 2018 – present – Tata Consulting Services, Senior Data Scientist - Machine Learning and Artificial Intelligence, Scikit-Learn, AWS SageMaker

●Built models for cybersecurity & anomaly detection using Random Forest, XGBoost, Logistic Regression

●Prepared a course (12 lectures) - Machine Learning and AI

●Multiple presentations on using Machine Learning and Artificial Intelligence in Finance

●Designed architecture for migrating data into the cloud (AWS) for analytics

June 2017 – June 2018 – Selectorweb, consulting projects - Machine Learning, AI, Analytics, NLP (Natural Language Processing), Anaconda Python3, Scikit-Learn, NLTK, Pandas, Numpy, TensorFlow, Amazon AWS (EC2, S3), Google Cloud (BigQuery, MySQL), IBM Cloud (IBM Watson – Natural Language Understanding for Sentiment Analysis, Tone Analyzer, Personality Insights)

●Galvanize ( ) - lectures about Deep Learning and AI

●SaleZoom ( ) - built web/Excel reporting system (hierarchical reports)

●JKCF ( ) - Machine Learning using Logistic Regression, Random Forest and NLP (Natural Language Processing) on highly imbalanced data. Built multiple mathematical models for selecting scholarship candidates. Data preparation, feature extraction/construction. Compared different approaches to solve "imbalanced data" problem, improved accuracy of the model

November 2014 – June 2017 - Penguin Random House, Consultant - Business Intelligence/Analytics, Big Data Collection and Integration (ETL/ELT), Machine Learning and AI, Reporting. Python (pandas, numpy, numba, cython), TensorFlow, Netezza, SQL, Redis, Amazon Cloud (AWS CLI, EC2, EBS, S3), Linux

●Created Python framework to work with IBM Netezza database

●Worked with several business groups to create various data feeds and tools to clean and ingest data

●Set up jobs to process millions of rows of vendor web data daily

●Wrote software for high-volume data collection (web scraping at ~2 Mln pages/day, APIs, files)

●Implemented projects on Amazon Cloud (AWS EC2) for parallel data collection and processing

●Wrote analytics tools for price analysis and estimation

●Managed work of several programmers, taught Python, analytics tools, machine learning and AI

●Organized "Deep Learning Book Club" to promote the use of Machine Learning and AI in-house

April 2012 - April 2014 – AppNexus, Inc, Consultant, Financial Data Analytics - Designed and implemented systems for processing big data (~60TB/day), data extraction, aggregation, data analytics, billing, and reporting. Linux, Python, Pandas, Vertica, Mysql, Hadoop, Hive, git.

●Designed and implemented a new billing and reporting framework, which extracts, aggregates, and processes hourly trading data. More than 43 thousand lines of structured python code. Processing using different custom rules for approx. 2000 different clients. Worked with multiple teams (Data Team, Finance, Product, and Sales Operations). Self-recoverable ETL processes and health-monitoring processes (automatic self-recovery after outages). Automatic back-testing to validate both data and code. The code was responsible for more than $3 Bln in billing (invoices and payments) in first 3 years

●Cut monthly billing data extraction time from 3 hours to 7 minutes (more than 30 Bln data rows)

●Designed business intelligence & analytics systems (multiple reports, graphs)

●Was responsible for daily operations of running billing and reporting (monitoring data loaders, validating the data and business requirements, running calculations, validating the numbers before importing them into ERP system).

●Trained the team to use and maintain the billing/reporting systems and cost_revenue database. Documented the code and processes. Taught courses on python/pandas to the company’s employees

April 2010 – April 2012 – JPMorgan Chase & Co., Investment Banking, Consultant - Mortgage Analytics (Structured Products Group). Unix/Linux, C/C++, Perl, Python, Sybase, Excel VBA

●Played a key role in migrating legacy C++ calculators (Fixed Income MBS) across data centers. Fixed and migrated ~100 applications, including old C/C++ applications, batch jobs, scripts used by modeling groups, and web applications. Wrote multiple Perl modules and applications (data feeds, utilities, monitoring, docs-builder system) to both help the migration and to streamline the nightly batch processes

●Instrumented analytical calculators to measure CPU time, instrument ids and categories, etc. These metrics would later be used to generate reports to identify how the resources were allocated

●Developed Excel VBA apps for regression analysis, and as a frontend for analytical C++ calculators

1994-2009 - multiple "Wall Street" consulting projects (Waterhouse Securities, Cantor Fitzgerald/Espeed, Morgan Stanley, Goldman Sachs, CSFB, JPMorgan Chase, Merrill Lynch, HSBC, Citigroup, WorldQuant) - Data Processing and Analytics using Unix, Perl, SQL (Sybase, DB2, MySQL), data preparation, migration, ETL (Extract, Transform, Load), Web design (HTML, CSS, Javascript). Also C, C++, Java, Jython, Excel VBA

1991-1994 Columbia University. Staff Associate - Mathematical modeling of dynamics of organic molecules. Programs were written in C and distributed to several unix computers in different labs, operations were controlled from one desktop Macintosh computer, results were fed from unix to Mac, custom programs (Mac C++ and IGOR software ) were used to automatically process the results

1981-1991 National Cardiology Research Center, Moscow, Russia. Researcher - Real time data acquisition and computer processing in neuro-physiological experiments. Semi-automatic pattern recognition, categorizing of data. Computer simulations of nerve impulse generation and propagation along C-fibers. Partial differential equations, Hodgkin-Huxley model, Crank-Nicolson & modified Runge–Kutta methods. Hardware and software design of medical equipment


●1988 Ph.D. in Mathematics (modeling of nervous coding), Moscow Institute of Physics and Technology

●1981 MS in Automation, Moscow Institute of Physics and Technology (MIPT), majoring in computers, electronics and biophysics, Diploma - computer simulation of nerve activity


Coursera courses (Machine Learning, Deep Learning, Google Cloud). Data Analysis with Python and Pandas. SEC Registered Representative ( Series 7, Series 63 ). CQF (Certificate in Quantitative Finance). Advanced Object Oriented Perl. C++ for Quantitative Finance. Advanced Excel for Financial Applications. Java 2 (Sun)