Yanwei Zhang

Graduate Research Assistant (865) 696-8290 (mobile)

College

of Computing

Georgia Institute of Technology 329957 Georgia Tech Station, Atlanta, GA 30332-1300

RESEARCH INTERESTS

With broad interests in computer systems. Specifically, I have been working on QoS-aware

scheduling, and resource/performance management techniques for high performance and cloud

computing platform.

Seeking

a Summer 2016 systems research/development internship.

EDUCATION

Georgia Institute of Technology at Atlanta, GA Aug. 2012 - Present

* Ph.D. in Computer Science GPA: 3.86/4.0

* Advisor: Karsten Schwan

University of Tennessee at Knoxville, TN Jan. 2010 - Jul. 2012

* M.S. in Computer Science GPA: 3.94/4.0

* Advisor: Xiaorui Wang

Shandong University, Jinan China Sep. 2004 - Jun. 2008

* B.S. in Computer Science Major GPA: 90.4/100

RESEARCH & WORK EXPERIENCE

Georgia Institute of Technology at Atlanta, GA Aug. 2012 - Present

* Graduate Research Assistant

* Researching and developing advanced QoS and resource management techniques for

building more effective systems, to achieve better resource usage and application

performance.

LinkedIn Corporation, Mountain View, CA May - Aug. 2015

* Summer Intern

* Researched and designed more efficient caching policies for LinkedIn's graph system,

to address the long-tail latency that are observed for query requests. Prototype was

evaluated against LinkedIn's experimental platform.

IBM T.J Watson Research Center, Yorktown Heights, NY May - Aug. 2014

* Summer Intern

* Researched and designed resource adaptation policies for energy efficiency

management across multi-tenant virtualized Hadoop clusters. Preliminary results

were conducted based on an openstack cloud environment.

Oak Ridge National Laboratory May - Aug. 2013

* Summer Intern

* Researched designs for active workflow systems managing near-real-time scientific

data analytics via adaptive stream processing techniques.

University of Tennessee at Knoxville, TN Jan. 2010 - Jul. 2012

* Graduate Research Assistant

* Developed various optimization-based systems for efficient resource and power

management of cloud-scale data centers.

Institute of Computing Technology, CAS, Beijing China Sep. 2008 - Nov. 2009

* Research Assistant

* Proposed an analytic model for VM-based data centers to help administrators gain

insights on the upper bound of consolidated physical servers needed to guarantee

QoS with the same loss probability of requests as in dedicated servers.

RESEARCH PROJECTS

Towards Better Sharing With Latency Critical Workloads (Gatech Ongoing Project)

* Exploring new opportunities to better share resources with tail-latency critical

applications to promote more effective resource usage, without sacrificing user-

perceived SLA experience.

PartialGraph: Faster Query on Very Large-Scale LinkedIn Social Networks (LinkedIn)

* Developed more effective caching policies for LinkedIn's graph system, to address

the long-tail latency that are observed for query requests.

* Incremental-based partial update strategy is leveraged to reduce massive network

traffic by users' 2nd-degree network connections. Prototype is evaluated against

LinkedIn's experimental platform.

Dynamic Energy-Efficient Resource Adaption Across Multi-tenant Virtualized Hadoop

Clusters (IBM T.J Watson Research Center)

* Designed resource adaptation policies to achieve a best-effort energy efficiency

across multi-tenant Hadoop clusters.

* First-Fit Decreasing (FFD) bin-packing algorithm is used to determine the destination

Vhadoop cluster to service the job.

Active Workflow System for Near Real-Time Extreme-Scale Science (GaTech and ORNL)

* Developed an active workflow system for near-real-time science knowledge

discovery in the era of ever-increasing unpredictability of system runtime behaviors.

* Designed a two-tier schema in which decisions become adaptively enhanced online

according to the system runtime status, instead of solely relying on determining what

and where to run scientific workflows beforehand, or partial dynamically. This is

enabled by embedding workflow along with data streams.

GreenWare: Greening Cloud-Scale Data Centers to Maximize the Use of Renewable

Energy (University of Tennessee at Knoxville)

* Devised an effective optimization-based model to maximize the use of renewable

energy within allowed operation budgets.

* Modeled the intermittent generation of renewable energy based on massive local

weather data, and formulated our core objective function as a constrained

optimization problem. Linear-fractional programming-based algorithm was proposed

as the optimal solution.

Electricity Bill Capping for Cloud-Scale Data Centers that Impact the Power Markets

* Proposed a two-tier optimization algorithm to minimize the electricity cost, and also

enforce a cost budget on the monthly bill for cloud-scale data centers that impact

power markets.

* Modeled impacts of power demands from cloud computing systems on their local

power prices by analyzing electricity price behaviors in real-world power markets

w.r.t power demands. Mix integer programming-based techniques was proposed.

Capital One Data Analysis Challenge

* Advised Capital One to decide what factors significantly influence decisions for new

bank branch locations.

* Applied Principal Component Analysis (PCA) techniques to analyze the massive

dataset provided by Capital One.

Utility Analysis for Internet-Oriented Server Consolidation in VM-Based Data Centers

(Institute of Computing Technology, Chinese Academy of Sciences, China)

* Proposed an analytic model for VM-based data centers to help data center

administrators gain insights onto the upper bound of consolidated physical servers

needed to guarantee QoS with the same loss probability of requests as in dedicated

servers. The proposed model can also evaluate the server consolidation in terms of

power and utility of physical servers.

* Leveraged queuing theory to model the interaction between user requests with QoS

requirements and capability flowing.

PEER-REVIEWED PUBLICATIONS

* Yanwei Zhang, Matthew Wolf, Karsten Schwan, Qing Liu, Greg Eisenhauer,

Scott Klasky, "Co-Sites: The Autonomous Distributed Dataflows in Collaborative

Scientific Discovery", the 10th Workshop on Workflows in Support of Large-Scale

Science (WORKS '15), in conjunction with SC 2015, November 15-20 2015, Austin,

Texas USA

* Yanwei Zhang, Qing Liu, Scott Klasky, Matthew Wolf, Karsten Schwan, Greg

Eisenhauer, Jong Choi, Norbert Podhorszki, "Active Workflow System for Near Real-

Time Extreme-Scale Science", the 1st Workshop on Parallel Programming for Analytics

Applications (PPAA 2014).

* Yanwei Zhang, Yefu Wang, and Xiaorui Wang, "Electricity Bill Capping for Cloud-

Scale Data Centers that Impact the Power Markets", IEEE/IFIP network operations and

management symposium (ICPP 2012). (Acceptance rate: 28%)

* Yanwei Zhang, Yefu Wang, and Xiaorui Wang, "TEStore: Exploiting Thermal and

Energy Storage to Cut the Electricity Bill for Datacenter Cooling", the 8th International

Conference on Network and Service Management (CNSM 2012). (Acceptance rate:

15.5%)

* Yanwei Zhang, Yefu Wang, and Xiaorui Wang, "GreenWare: Greening Cloud-Scale

Data Centers to Maximize the Use of Renewable Energy", the 12th ACM/IFIP/USENIX

International Middleware Conference (Middleware 2011). (Acceptance rate: 19%)

* Yanwei Zhang, Yefu Wang, and Xiaorui Wang, "Capping the Electricity Cost of Cloud-

Scale Data Centers with Impacts on Power Markets", the 20th ACM International

Symposium on High-Performance Parallel and Distributed Computing (HPDC 2011). (2-

page poster; Acceptance rate: 20%)

* Yefu Wang, Xiaorui Wang, and Yanwei Zhang, "Leveraging Thermal Storage to Cut the

Electricity Bill for Datacenter Cooling", the 4th Workshop on Power-Aware Computing

and Systems (HotPower 2011), in conjunction with the 23rd ACM Symposium on

Operating Systems Principles (SOSP). (Acceptance rate: 26%)

* Ying Song, Yanwei Zhang, Yuzhong Sun and Weisong Shi "Utility Analysis for

Internet-Oriented Server Consolidation in VM-Based Data Centers", the 11th IEEE

International Conference on Cluster Computing (Cluster 2009).

SKILLS AND SELECTED GRADUATE COURSES

* Spoken languages: Chinese (native), English (proficient)

Programming languages

: C, Python, Java (basic)

* Data tools: R

* Platforms: Linux, Windows

* Courses: Distributed Systems, Computer Systems Architecture, Advanced Operating

Systems, Advanced Algorithms, Software Systems, Real-Time System, Data Mining,

Markov Chains/Computer Science, Applied Linear Algebra

HONORS AND AWARDS

* 2011 HPDC conferences Student Travel Award

* 2010-2011 University of Tennessee EECS Department Excellence Fellowship

* 2008 2008 National Scholarship (China)

* 2008 Excellent Undergraduate of Shandong Province

* 2004-2005 Provincial-level Outstanding Students

* 2004-2007 First-class Scholarship for 3 Consecutive Years as Outstanding Students

* 2006 2006 President Scholarship of Shandong University