Cloud computing technology is the culmination of technical advancements in computer networks, hardware and software capabilities that collectively gave rise to computing as a utility. It offers a plethora of utilities to its clients worldwide in a very cost-effective way and this feature is enticing users/companies to migrate their infrastructure to cloud platform. Swayed by its gigantic capacity and easy access clients are uploading replicated data on cloud resulting in an unnecessary crunch of storage in datacenters. Many data compression techniques came to rescue but none could serve the purpose for the capacity as large as a cloud, hence, researches were made to de-duplicate the data and harvest the space from exiting storage capacity which was going in vain due to duplicacy of data. For providing better cloud services through scalable provisioning of resources, interoperability has brought many Cloud Service Providers (CSPs) under one umbrella and termed it as Cloud Federation. Many policies have been devised for private and public cloud deployment models for searching/eradicating replicated copies using hashing techniques. Whereas the exploration for duplicate copies is not restricted to any one type of CSP but to a set of public or private CSPs contributing to the federation. It was found that even in advanced deduplication techniques for federated clouds, due to the different nature of CSPs, a single file is stored at private as well as public group in the same cloud federation which can be handled if an optimized deduplication strategy be rendered for addressing this issue. Therefore, this study has been aimed to further optimize a deduplication strategy for federated cloud environment and suggested a central management agent for the federation. It was perceived that work relevant to this is not existing, hence, in this paper, the concept of federation agent has been implemented and deduplication technique following file level has been used for the accomplishment of this approach.
Cloud computing is a very progressive technology and, it is stretching its wings in all directions due to its most applaudable feature “pay-as-you-use” [
This paper is structured into various sections for better presentation of the study. Section 2 entails a detailed literature survey over the advancement of cloud federation and potential of implementing deduplication in it. Section 3 presents the research gap and existing system while Section 4 confers the proposed system by implementing the concept of federation agent in optimized deduplication approach for federated cloud environment. In Section 5, observations have been recorded and discussions on results have been conducted. Section 6 concludes the paper and sheds light on the future scope of this work.
Recently, many researches are being conducted on interoperability in cloud computing and various deployment models under one coalition to make the optimal usage of the cloud resources. This literature survey has been divided into two sections to understand the progressive metamorphosis of cloud federation and deduplication.
In the modern context, having limited physical resources is the primary limitation of clouds. If a cloud has exhausted all the computational and storage resources, it cannot provide the service to its clients [
Authors in [
With the increased hand held electronic devices in the market and pervasive internet access, gigantic amount of data is being generated which is stored at various cloud storages. Mostly it is the same data being replicated at and from various locations. Even the regular backups which are maintained at organizations and the identical files created by different departments of an organization lead to data replication which further results in wastage of storage space. The technique through which only a single instance of data is stored while a logical references of this single copy of data is passed as and when required in future, to avoid the storage wastage, is known as deduplication. Categorization of deduplication techniques is done on the basis of time, location and granularity. File level or block level deduplication is observed under granularity based deduplication. Source side and destination side deduplication are categorised under location based and deduplication can be post process or inline process if implemented in reference to time [
The literature survey revealed that many policies have been devised for deduplication in public and private cloud deployment models with various hashing techniques. Owing to the need for load sharing in cloud computing, interoperability has included many CSPs which are bound by Federation Level Agreements (FLA), either private or public under an alliance known as Federated Cloud. Therefore, during deduplication the search for duplicate copies is made in all the participating CSPs, whether private group or public group, instead of single group. Also, in the optimized deduplication technique in federated clouds, due to the different nature of CSPs, a single file is being stored at private group as well as public group in the same cloud federation which can be further optimized if the central management of the federation is possible. Deep study of technical literature suggests that it has a lot of potential for further research, as not much work has been done, leading to a huge research gap in this field and hence there is enough scope for further research.
In the existing system (as shown in
Upon assessing the existing system, it was concluded that a better policy for deduplication can be devised by deploying bloom filters for checking the membership of an element in federated cloud environment instead of public/private cloud groups [
a) The same file can be stored only once from the same user in case of Private CSPs, whereas in Public CSPs it can be stored once only irrespective of the user, as a reference can be passed to all the other entries.
b) On deleting a file which has many references in public CSPs, only a logical pointer is deleted for that particular user not the original file.
The proposed optimized deduplication strategy yields excellent results and harvests storage from duplicated data by saving a unique file in Private CSPs and a single copy of a file in Public CSPs. Moreover, by implementing bloom filters, in addition to the space, the total time for lookup operations has also reduced.
Though, the previously proposed system (optimized data duplication strategy for federated cloud environment using Bloom Filters) had shown exceptional results and harvested memory from the cloud storage while decreasing the total time during lookup operations, still, this system can be advanced to gain more optimization in deduplication. We have seen that in proposed system one unique file can be saved in Private Clouds group (CSP1 & CSP2) and the same file can also be saved in the Public Clouds group (CSP3, CSP4, CSP5). Therefore, even in the same federation, due to different nature of CSPs, a single file is being stored twice, once in private group and then in public group, in the same cloud federation.
Deep analysis of technical literature suggests that if the central management of the federation is governed by one agent that keeps the check for every entry in any CSP in one federation, this issue can be managed and the agent can be referred as Federation Agent (as shown in
S. No. | Existing system | Proposed system |
---|---|---|
1 | In the existing system the deduplication has been implemented on either private or public clouds | The deduplication technique has been optimized to accommodate federated cloud environment |
2 | All the files stored by private users are stored | Only Unique files stored by private users are stored |
3 | Files by public users are not checked for any entry made by private users | Files by public users are checked for any entry in the Federation table, if it exists, only the reference to the existing file is passed |
4 | Due to storage of same file in different CSPs the total memory consumption is more and redundancy factor is also high | With the implementation of federation agent (central management agent) total memory consumption and redundancy factor is less than that of the existing system |
An optimized data duplication strategy has been proposed and the methodology for the events has been given as below:
1. Simulation is initialized by starting CloudSim 3.0.3 package that creates the datacenter broker, virtual machines, cloudlets and Federation Agent.
Let cloud be a set of elements represented as follows:
cloud= {Dn, Nbw, Dm, Br}
where Dn is number of devices, Nbw is Network bandwidth, Dm is the Deployment Model and Br is the Broker.
Further, Devices be a set of elements represented as,
Devices = {Nodes, Switches, Storage, Controller}
Storage be a set of elements represented as Storage = {fa….fm} where fa to fm are set of files stored in datacenter.
Therefore, federation of clouds can be represented as
FedC = {{Pvt1, Pvt2} {Pub3, Pub4, Pub5}}
where Pvt1 and Pvt2 are Private Clouds and Pub3, Pub4, Pub5 are Public CSPs
For an instance any file (fn) belonging to any User (Un) can be represented as Unfn
private CSP
Hence, Unfn
where CSP1 and CSP2 are Private CSPs and CSP3, CSP4, CSP5 are Public CSPs
2. Option 1: Private access: When an authorized user signs-in in a chosen Private CSP, its users’ credentials are used to encrypt the selected file and this encrypted file is sent to the server.
After Logging in by user U1, a key is generated based on his attributes U1 → keyG
A file f1 is selected for uploading and f1 is encrypted with KeyG → (Enc(f1))
The encrypted file Enc(f1) is transferred to cloud server and then a hash value is computed Hash(Enc (f1))
Using Bloom Filters for checking membership of an element, BF(). This function contains set of elements from 0 to n−1 where n is length of an array.
FA = Hash (Enc (f1)
For deduplication using Bloom Filters, a lookup operation is conducted against generated hash.
If found == 0
set.add ( element (f1))
f1 → {CSP1 || CSP2}
Storage = {fa….fm + f1}
Else
f1⊆ {CSP1 || CSP2}
FA.update (&reference (f1)) //‘&reference’ is passed instead of saving the file
Option 2: Public Access: When an authorized user logs-in in the selected Public CSP, the name of the file which is to be uploaded is sent to the server where a key is generated and finally this key is sent back to the client.
The key received at client side is used to encrypt the selected file.
After Logging in by Anonymous (Public) user, a file f1 is selected for uploading
This KeyH is used to encrypt the file fi and transferred to cloud server where a hash value is computed.
Using Bloom Filters for checking membership of an element, BF() For 0 to n−1 where n = length of the array
For deduplication using Bloom Filters, lookup operation is conducted against generated hash.
Dedup()
{
If found == 0
set.add (element)
f1 → {CSP3 || CSP4 || CSP5}
Else
f1 ⊆ {CSP3 || CSP4 || CSP5}
‘&reference’ is updated in the status instead of again saving the file
CSPk.update (f1)
Where CSPk may be CSP3, CSP4 or CSP5.
}
3. Federation Agent FA(), which feeds on Bloom Filters BF(), searches the file in the federation table and checks for the replicated file.
The file can be
Dedup()
{
If found == 0
{
FA.update (Unfn)
set.add (element)
Unfn.. → {FedC}
Unfn.. ⊆ {FedC}
}
Else
Unfn ⊆ FA and ‘&reference’ is updated in the status instead of again saving the file
}
4. If the file doesn’t exist then a new file is stored otherwise a logical reference is passed.
Experimental results of the proposed system are stated in
Sr. No. | Name of the file | User_Id | #code | Cloud_name | Action_status |
---|---|---|---|---|---|
1 | File1 | User1 | #abc | CSP1 | Stored |
2 | File1 | User2 | #abc | CSP1 | &Reference |
3 | File1 | User1 | #abc | CSP1 | Denied |
4 | File1 | User1 | #abc | CSP2 | &Reference |
5 | File2 | User1 | #def | CSP1 | Stored |
6 | File3 | – | #ghi | CSP3 | Stored |
7 | File1 | – | #abc | CSP3 | &Reference |
8 | File2 | – | #def | CSP3 | &Reference |
9 | File1 | – | #abc | CSP4 | &Reference |
Actions performed during deduplication in federated clouds using Federation Agent as shown in the When a user ‘User1’ tries to upload file1 with hash code #abc for the first time on private cloud service provider ‘CSP1’, the file1 is uploaded/stored and the status is shown as ’Stored’. When a user ‘User2’ tries to upload file1, with hash code #abc, at private cloud service provider CSP1, then instead of saving the same file again the reference to file1 is passed and the status of action is updated as ‘&Reference’. When a user ‘User1’ tries to upload file1, with hash code #abc, at private cloud service provider CSP1, the action status is updated as ‘Denied’ as the same file already exists at CSP1 with same user (User1). When a user ‘User1’ tries to upload file1, with hash code #abc, at private cloud service provider CSP2, then instead of saving the same file again the reference to file1 is passed and the status of action is updated as ‘&Reference’. When a user ‘User1’ tries to upload file2, with hash code #def, for the first time at private cloud service provider CSP1, the file2 is uploaded/stored and status of the action is shown as ‘Stored’. When a public user tries to upload file3, with hash code #ghi, for the first time at public cloud service provider CSP3, the file3 is uploaded/stored and status of action is shown as ‘Stored’. When a public user tries to upload file1, with hash code #abc, at public cloud service provider CSP3, then the reference to file1 is passed and the action status as ‘&Reference’ is updated. When a public user tries to upload file2, with hash code #def, at public cloud service provider CSP3, then the reference to file2 is passed and the action status as ‘&Reference’ is updated. When a public user tries to upload file1, with hash code #abc, at public cloud service provider CSP4, then the reference to file1 is passed and the action status as ‘&Reference’ is updated.
By implementing Federation Agent, which is centrally managing all the transactions in common federation table, it becomes well aware about the existence of any file in any of its participating CSPs, either Private or Public. As a result, it stores a single copy of each file and only the reference is passed to all the other transactions where the same file is under consideration.
The outcomes of the proposed system are suggesting that this strategy is very beneficial in harvesting the cloud storage by removing the duplicated data as shown in the
S. No. | Name of the file | File size | User_id | Cloud_name | Action_status | Space after deduplication |
---|---|---|---|---|---|---|
1 | File1 | 100 | User1 | CSP 1 | Stored | 100 |
2 | File1 | 100 | User2 | CSP 1 | Stored | 100 |
3 | File1 | 100 | User1 | CSP 1 | Denied | – |
4 | File1 | 100 | User1 | CSP 2 | &Reference | – |
5 | File2 | 200 | User1 | CSP 1 | Stored | 200 |
S. No. | Name of the file | File size | User_id | Cloud_name | Action_status | Memory after deduplication |
---|---|---|---|---|---|---|
1. | File11 | 200 | – | CSP3 | Stored | 200 |
2. | File11 | 200 | – | CSP3 | &Reference | – |
3. | File11 | 200 | – | CSP4 | &Reference | – |
4. | File12 | 300 | – | CSP3 | Stored | 300 |
5. | File1 | 100 | – | CSP3 | Stored | 100 |
6. | File2 | 200 | – | CSP3 | Stored | 200 |
7. | File1 | 100 | – | CSP4 | &Reference | – |
As shown in the
As shown in the
As shown in the
As shown in the
As shown in the
S. No. | Name of the file | File_size | User_id | Cloud_name | Action_status | Memory after Deduplication |
---|---|---|---|---|---|---|
1 | File1 | 100 | User1 | CSP1 | Stored | 100 |
2 | File1 | 100 | User2 | CSP1 | &Reference | – |
3 | File1 | 100 | User1 | CSP1 | Denied | – |
4 | File1 | 100 | User1 | CSP2 | &Reference | – |
5 | File2 | 200 | User1 | CSP1 | Stored | 200 |
6 | File11 | 200 | – | CSP3 | Stored | 200 |
7 | File11 | 200 | – | CSP3 | &Reference | – |
8 | File11 | 200 | – | CSP4 | &Reference | – |
9 | File12 | 300 | – | CSP3 | Stored | 300 |
10 | File1 | 100 | – | CSP3 | Stored | – |
11 | File2 | 200 | – | CSP3 | Stored | – |
12 | File1 | 100 | – | CSP4 | &Reference | – |
As shown in the
This paper presents the concept of Federation Agent to mollify the meagerness of the existing system and to further improve the data duplication strategy in cloud environment. Federation Agent is centrally managing all the transactions in common federation table and is well aware about the existence of any file in any of its participating CSPs, either Private or Public. The simulation of the proposed strategy has been performed on a cloud simulation tool, CloudSim 3.0.3, for implementing and testing the algorithm. The outcomes of the proposed system are suggesting that this strategy is very beneficial in harvesting the cloud storage by removing the duplicated data and conserving considerable amount of memory. It has been empirically proven that the system without Federation Agent is harvesting 33.3% memory from Private CSPs and 38.46% from Public CSPs than existing system, whereas, the proposed system with Federation Agent is conserving 57.84% cloud storage. In this study, file level deduplication technique has been used in the proposed system, in future, more research can be done on block level deduplication techniques in federated cloud environment. In addition to this, more alternatives can be sought for better indexing and faster lookup operations for deduplication in federated clouds.
This paper and research behind it would not have been possible without the exceptional support of my supervisor and research colleagues.