-
Notifications
You must be signed in to change notification settings - Fork 0
/
3_model.tex
78 lines (60 loc) · 6.51 KB
/
3_model.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
\section{Our Modeling Methodology}
\label{sec:model}
\vspace{10pt}
This section describes the general performance modeling ideas used. In Section~\ref{sec:eval}, %we describe how
this basic model is adapted to three application case studies on the Amazon AWS cloud. The primary interest in this paper is not in identifying the most accurate performance model but rather in exploring if diversity in the VMs used for calibrating the chosen model helps improve its efficacy.
Therefore, although numerous modeling choices exist in the literature for such applications, we use a relatively simple multiple linear regression approach because: (i) it serves as a good starting point for evaluating the hypothesis, (ii) it is easy to cast, train, and evaluate, and (ii)it works well - especially under low/moderate throughputs for the normal operating regions of well-provisioned, performance-sensitive tenant workloads.
%\subsection{Background on Multiple Linear Regression}
\noindent{\bf Multiple Linear Regression:} Given a data set \\ $\{y_i,x_{i1},\ldots,x_{ip}\}^n_{i=1}$, a linear regression on multiple independent variables $x_{p}$ and dependent variable $y$ is a set of parameters $\beta_i$ that model a linear relationship between $y$ and $x_{i}$~\cite{CMUStatsBook}.
\begin{displaymath}{
y_i = \beta_1 x_{i1}+\ldots+\beta_{p}x_{ip}+\epsilon_i
}\end{displaymath}
The parameters $\epsilon_i$ are the error terms, an unobserved random variable. The parameters $\beta_i$ are chosen to minimize the values of $\epsilon_i$ for the entire data set. Specifically, the $\beta_i$ are chosen to minimize the sum of squares $\sum_{i=1}^{n} \epsilon^2_i$.
%\subsection{Model}
%Our goal is to model the performance of different virtual machine instance types, and show a correlation between the performance of those applications that can be used to predict the performance of those applications on new virtual machine instance types.
%The performance of a server application is evaluated as the time latency $y_{L}$ for responding to a request.
We choose as our dependent variable the average latency $y_{L}$ and as our independent variables: (i) workload/application properties - throughput, degree of replication, and read/write ratio and (ii) resource capacity of the VMs being used - number of CPU cores, clock rate of each CPU, memory, network bandwidth, and type of storage (SSD vs. magnetic).
%is the dependent variable in our model. The benchmarks are run at multiple levels of throughput (requests/second), which is taken as an independent variable $x_{T}$
%Virtual machine instances vary by in many ways: the type of CPU, CPU clock speed, number of CPUs, amount of memory, type or storage (SSD vs magnetic), amount of storage, and available network bandwidth. Each of these parameters can be modeled by as an independent variable $x_p$.
We define a training set $S = \{VM_i\}^n_{i=1}$ as a set of virtual machines, each characterized by $x_p$. In each of our experiments, we run the application whose performance we wish to model on $VM_i$ for various $x_{T}$ and measure the latency $x_{L}$. We then find a multiple linear regression $M_S$ on $\{y_{iL},x_{iT},\ldots,x_{ip}\}^n_{i=1}$. (For a given instance $VM_i$, the values of $x_{ip}$ are fixed for all measurements for that instance.)
%We then consider another instance $VM_{test}\notin S$ and use $M_S$ to predict the dependent variable $y_L$.
\begin{figure}
\centering
\includegraphics[scale = 0.35]{two_regions.eps}
\caption{Two regions of latency vs. throughput for Redis. }
\label{figure:combined}
\end{figure}
\noindent{\bf Measure of Model Efficacy:} We use the predicted coefficient of multiple determination ($R^2_{predicted}$) as our measure of model accuracy which is defined as:
\theoremstyle{definition}
\newtheorem{mydef}{Definition}
\begin{mydef}
For a test instance $VM_{test}$ with $x_{test,i}$,
\begin{displaymath}
R_{predicted}^2=1-\frac{\sum_{i=1}^{n} (y_{test,i} - \hat{y}(x_{test,i}))^{2}}{\sum_{i=1}^{n} (y_{test,i} - \bar{y_{test}})^{2}}
\end{displaymath}
where
\begin{displaymath}
\hat{y}(x_{test,i})=\sum_{i=1}^{n} \beta_i x_{test,i}
\end{displaymath}
and $\bar{y}_{test}$ is the mean of $y_{test,i}$.
\end{mydef}
%\subsubsection{Input and Predicted Variables}
To see evidence supporting our hypothesis, we expect to see the following behavior: for larger training sets $S$, the model should fit better to $VM_{test}$, corresponding to an increasing $R^2_{predicted}$, assuming sufficient variability in the values of $x_p$ for $VM_j\in S$ to cover the values of $x_p$ for $VM_{test}$.
\noindent{\bf Discussion:} Our linear regression based model is known to perform poorly when queueing delays become dominant contributors to overall latency~\cite{Stewart07}.
For example, if we were to model the entire set of latency observations (for experiements done using Redis, more details
in Section~\ref{sec:redis}) using our model, we would obtain a poorer predictor than the two separate linear regression
models shown in Figure~\ref{figure:combined}, one each for the ``low/moderate'' (Region 1) and ``high'' (Region 2) throughput regions. This suggests two points: (i) using domain knowledge (e.g., the distinction between low and high throughput regions), a
tenant may be able to use linear regression to obtain better models, and (ii) more sophisticated models may be
warranted for the needs of certain tenants. Again, since our interest is in the impact of diversity on modeling accuracy,
we focus only on modeling performance in Region 1 for the rest of this paper.
It is important to keep in mind the basic assumption of linear regression about the independent variables being independent of each other (i.e., the $x_p$ for $VM_j\in S$ need to be independent). Interestingly, among the independent
variables in our model, the number of cores and memory capacity are prone to be problematic on this front - typically
larger VMs come both with more CPUs and more memory - see Figure~\ref{fig:diversity}. To overcome this problem, we attempt to choose VM types
in our experiments where this correlation is weak. Furthermore, the results we present in this paper are for a subset
of our experimental findings wherein the entire working set fits in VM memory, rendering memory moot as a predictor of
performance (we do incorporate memory in our more general experiments). Finally, the potential shortcomings of predicted
$R^2$ as a measure of model accuracy should be kept in mind when interpreting our results~\cite{CMUStatsBook}.
%}
%\subsubsection{Measure of Model Efficacy}
%\subsection{Discussion}
%\vspace{10pt}