COST OPTIMIZATION FOR DYNAMIC REPLICATION AND MIGRATION OF DATA IN CLOUD DATA CENTERS
Cloud Storage Providers (CSPs) offer geographically data stores providing several storage classes with different prices.An important problem facing by cloud users is how to exploit these storage classes to serve an application with a time-varyingworkload on its objects at minimum cost. This cost consists of residential cost (i.e., storage, Put and Get costs) and potential migrationcost (i.e., network cost). To address this problem, we first propose the optimal offline algorithm that leverages dynamic and linearprogramming techniques with the assumption of available exact knowledge of workload on objects. Due to the high time complexity ofthis algorithm and its requirement for a priori knowledge, we propose two online algorithms that make a trade-off between residentialand migration costs and dynamically select storage classes across CSPs. The first online algorithm is deterministic with no need ofany knowledge of workload and incurs no more than 2 1 times of the minimum cost obtained by the optimal offline algorithm,where is the ratio of the residential cost in the most expensive data store to the cheapest one in either network or storage cost. Thesecond online algorithm is randomized that leverages “Receding Horizon Control” (RHC) technique with the exploitation of availablefuture workload information for w time slots. This algorithm incurs at most 1 + w times the optimal cost. The effectiveness of theproposed algorithms is demonstrated through simulations using a workload synthesized based on characteristics of the Facebookworkload.
the following five main categories.Using multiple cloud services. Reliance on a singlecloud provider results in three-folds obstacles: availability ofservices, data lock-in, and non-economical use . To alleviatethese obstacles, one might use multiple cloud providersthat offer computing, persistent storage, and network serviceswith different features such as price and performance. Being inspired by these various features, automaticselection of cloud providers based on their capabilities anduser’s specified requirements are proposed to determinewhich cloud providers are suitable in the trade-offs suchas cost vs. latency and cost vs. performance .Several previous studies attempted to effectively leveragemultiple CSPs to store data across them. RACS utilized Erasure Coding to minimize migration cost if eithereconomic failure, outages, or CSP switching happens.Hadji proposed various replica placement algorithmsto enhance availability and scalability for encrypted datachunks while optimizing the storage and communicationcost. None of these systems explore minimizing costby exploiting pricing differences across different cloudproviders with several storage classes when dynamic migrationof objects across CSPs is a choice.
Our study is motivated by these pioneer studies asnone of them can simultaneously answer the aforementionedquestions (i.e., placements and migration times of objects).To address these questions, we make the followingkey contributions:_ First, by exploiting dynamic programming, we formulateoffline cost optimization problem in which theoptimal cost of storage, Get, Put, and migration is calculatedwhere the exact future workload is assumedto be known a priori._ Second, we propose two online algorithms to findnear-optimal cost as shown experimentally. The firstalgorithm is a deterministic online algorithm with thecompetitive ratio (CR) of 2 1, where is the ratio ofthe residential cost in the most expensive DCs to thecheapest ones either in storage or network price. Thesecond algorithm is a randomized online algorithm withthe CR of 1 + w, where w is the available look aheadwindow size for the future workload. We also analysethe cost performance of the proposed algorithms inthe forms of CR that indicates how much cost in theworst case the online algorithms incur as compared tothe offline algorithm._ In addition to the theoretical analysis, an extensivesimulation-based evaluation and performance analysisof our algorithms are provided in the CloudSimsimulator using the synthesized workload basedon Facebook workload specifications.
CONCLUSIONS AND FUTURE WORK
To minimize the cost of data placement for time-varyingworkload applications, developers must optimally exploitthe price difference between storage and network servicesacross multiple CSPs. To achieve this goal, we designedalgorithms with full and partial future workload information.We first introduced an optimal offline algorithmto minimize the cost of storage, Put, Get, and potentialmigration, while satisfying eventual consistency and latency.Due to the high time complexity of this algorithmcoupled with possibly unavailable full knowledge of thefuture workload, we proposed two online algorithms withprovable performance guarantees. One is deterministicwith the competitive , where is the ratio ofresidential cost in the most expensive data centers to thecheapest ones either in storage or network price. The otherone is a randomized with the competitive ratio of 1+ w,where w is the size of available look-ahead windows ofthe future workload. Large scale simulations driven by asynthetic workload based on the specification of Facebookworkload indicate that the cost savings can be expectedusing the proposed algorithms under the prevailing Amazon’s,Microsoft’s and Google’s cloud storage servicesprices. As future work, we plan to propose algorithms inwhich the requested availability of objects in terms of thenumber of nines is also considered.
 S. Muralidhar et al., “f4: Facebook’s warm blob storage system,”in 11th USENIX Symposium on Operating Systems Design andImplementation (OSDI 14). Broomfield, CO: USENIX Association,Oct. 2014, pp. 383–398.
 G. Skourletopoulos et al., “An evaluation of cloud-based mobileservices with limited capacity: a linear approach,” Soft Computing,pp. 1–8, 2016.
 A. Bourdena et al., “Using socio-spatial context in mobile cloudoffload process for energy conservation in wireless devices,”IEEE Transactions on Cloud Computing, vol. PP, no. 99, pp. 1–1,2016.
 A. Kathpal et al., “Analyzing compute vs. storage tradeoff forvideo-aware storage efficiency,” in Proceedings of the 4th USENIXConference on Hot Topics in Storage and File Systems (HotStorage’12).Berkeley, CA, USA: USENIX Association, 2012, pp. 13–13.
 D. Bermbach et al., “Metastorage: A federated cloud storagesystem to manage consistency-latency tradeoffs,” in Proceesdingsof IEEE CLOUD (CLOUD’11), 2011, pp. 452–459.
 K. P. Puttaswamy et al., “Frugal storage for cloud file systems,”in Proceedings of the 7th ACM European Conference on ComputerSystems (EuroSys’12). New York, NY, USA: ACM, 2012, pp. 71–84.
 Z. Wu et al., “Spanstore: Cost-effective geo-replicated storagespanning multiple cloud services,” in Proceedings of theTwenty-Fourth ACM Symposium on Operating Systems Principles(SOSP’13). New York, NY, USA: ACM, 2013, pp. 292–308.
 Y. Wu et al., “Scaling social media applications into geodistributedclouds,” IEEE/ACM Trans. Netw., vol. 23, no. 3, pp.689–702, June 2015.
 R. N. Calheiros et al., “Cloudsim: A toolkit for modeling andsimulation of cloud computing environments and evaluation ofresource provisioning algorithms,” Softw. Pract. Exper., vol. 41,no. 1, pp. 23–50, Jan. 2011.
 B. Atikoglu et al., “Workload analysis of a large-scale keyvaluestore,” in Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurementand Modeling of Computer Systems (SIGMETRICS’12). NewYork, NY, USA: ACM, 2012, pp. 53–64.