Cloud computing offers internet location-based affordable, scalable, and independent services. Cloud computing is a promising and a cost-effective approach that supports big data analytics and advanced applications in the event of forced business continuity events, for instance, pandemic situations. To handle massive information, clusters of servers are required to assist the equipment which enables streamlining the widespread quantity of data, with elevated velocity and modified configurations. Data deduplication model enables cloud users to efficiently manage their cloud storage space by getting rid of redundant data stored in the server. Data deduplication also saves network bandwidth. In this paper, a new cloud-based big data security technique utilizing dual encryption is proposed. The clustering model is utilized to analyze the Deduplication process hash function. Multi kernel Fuzzy C means (MKFCM) was used which helps cluster the data stored in cloud, on the basis of confidence data encryption procedure. The confidence finest data is implemented in homomorphic encryption data wherein the Optimal SIMON Cipher (OSC) technique is used. This security process involving dual encryption with the optimization model develops the productivity mechanism. In this paper, the excellence of the technique was confirmed by comparing the proposed technique with other encryption and clustering techniques. The results proved that the proposed technique achieved maximum accuracy and minimum encryption time.
Cloud computing has become a challenging platform nowadays which raises a considerable impact on the Information Technology (IT) industry and community activities [
Cloud data safety should be guaranteed and the readiness of servers to get upgraded for huge storage zones in an organization is important, especially using data encryption techniques. However, a major problem is that servers cannot store important information in reduplication technique when there is excess of encrypted data [
In this paper, cloud big data security is proposed using dual encryption technique. The study uses clustering model to analyze the deduplication process hash function. Multi kernel Fuzzy C Means (MKFCM) helps cluster data, which is then stored in cloud on the basis of confidence data encryption procedure. Then, the confidence finest data is implemented in homomorphic encryption data by following SIMON Cipher (OSC) technique. This secure dual encryption technique combining the optimization model enhances the presentation. Finally, the findings of the work are compared with other encryption and clustering techniques.
Venila et al. [
Celcia et al. [
Sookhak et al. [
Miao et al. [
A message-locked encryption model, termed as block-level message-locked encryption, was proposed by few researchers. The excess of metadata storage space is block-labeled from encapsulated block keys. For block label comparison, a huge computation was performed on the overhead by BL-MLE [
Cloud-of-Clouds is a deduplication-assisted essential storage model [
Pietro et al. [
Cloud Service Provider (CSP) gives service storage and it cannot be relied upon completely as the substance of accumulated information is of interest for many. At the same time, it needs to perform genuinely on data storage in a sequential manner and increase business benefits.
Incompetence and poor scalability result from conventional privacy storage with huge data sets.
Conventional privacy protection schemes could not tackle everything rather than focus on a single issue only; especially when it comes to security and privacy aspects, the schemes were unrealistic. Custom-made privacy safeguarding schemes are cost-incurring and are difficult to actualize.
Reducing the computational load in the set of directors and achieving frivolous authenticator cohort serves are major problems.
The research works conducted earlier carried out M-clustering process manually which also had mismatch of information. Simultaneously, both privacy of information and security are compromised in conventional methods.
The main aim of the proposed model is to store big datasets in cloud under high security. In this work, various approaches are utilized by producing Deduplication dataset with data encryption and decryption processes. In the beginning, the dataset is encrypted using a mapper to make a key. Then the mapper forms the dataset as groups, which feeds the data into duplication form according to matching score value. The process of deduplication restricts multiple entries or repetitive data. Deduplication stages are completed according to the amount of issue. Subsequently, these datasets are subjected to clustering. In this process, MFCM is utilized to cluster the data which allows a data to be robust via double or more groups. These clustered data are then encrypted through dual encryption, i.e., homomorphic and SIMON Cipher (OSC). For data decryption, the encrypted data is securely accessed since the approved information holders can get the data with the help of symmetric keys. To enhance the presentation of SIMON cipher, Modified Krill Herd Optimization (KHO) method is utilized. The Modified KHO method builds the encrypted key and provides the most significant privacy-preserved information. Again, when a value is equivalent to or more prominent by threshold value, the production is encouraged in the direction of minimizer. The information is then stored in cloud server using a reducer. The proposed technique is shown in
Map-Reduce system is utilized for big data analytics across different servers. Map-Reduce algorithm includes two significant responsibilities which are mentioned herewith.
Map phase: In this segment, soil data is obtained as input and divided into every map task and M Map task executes the comparable data.
Reduced phase: In this segment, soil data is clustered and separated from preceding segment using Standard Component Analysis (SCA). Secondly, the final aim obtains the output alike input map and unites those data tuples into light group of tuples.
Input: Database involved
Map: Data sorting and filtering
Reduction: Similarity and missing values are redistributed to the mapped data.
Output: Information reduction
For most of the part, catalog is the outcome desired by Map-Reduce clients to acquire. Map and functions’ reduction are specified by clients, based on particular applications.
Data de-duplication is an optimization process which is used to eliminate specific excess data. In this process, every data proprietor is chosen and similar data is transferred. Then, a particular copy of duplicate data is shared, whereas the duplicate copies in the storage are expelled. Data deduplication basically refers to the storage of present data on the disk by a marvel that supersedes indistinct data in a record or identical locales of the document (comparable data). This cloud data deduplication process is performed using a hash function. In this process, the initial document is used first after which the pre-processed data is utilized to create the confirmed data. Data duplication occurs when the data carrier saves similar data which is already saved on CSP as shown in
Hash-based data de-duplication strategies utilize a hashing algorithm to distinguish ‘large piece’ of information. Deduplication process evacuates not-so-special blocks. The actual process of data deduplication can be executed in a number of distinctive ways. Duplicate data is basically disposed by comparing two documents and finalizing on one established dataset, so that the other dataset, which is of no use further, is erased. Either variable or fixed length hash-based de-duplication breaks’ data gives the ‘chunks [
Clustering systems are one of the element invalid systems which are processed in forward direction. Clustering systems are main element in invalid systems that can be operated to position the feature in resemblance beam, among the data substance personalities. The presentation in verification procedure is developed and this clustering procedure is based on the aspect of stoppage.
Hard C-means algorithm provides object classification and image area clustering by fuzzy C-means clustering algorithm. The minimization of criterion function is the fundamental element in fuzzy C-means clustering alike hard K-means algorithm. The following equation explains the transformation of goal task.
where, the real number
Initialize the duplicated data cluster centers.
Determine D-distance among cluster centers and data using the equation given below:
Compute the function of fuzzy membership
Based on membership function, calculate fuzzy centers or centroid as:
Using FCM algorithm, the number of completed iterations is determined to understand the degree of membership correctness.
In the proposed duplicated data clustering process, multiple kernels are considered, i.e., Gaussian model-based Kernels such as K1 and K2. The projected multiple kernel fuzzy c-means (MKFCM) algorithm simultaneously finds the best degree of participation and ideal kernel weights for no-negative combination of kernels’ arrangement. The objective function of MKFCM is shown in the condition given herewith.
Kernel Functions
In the above equation,
By replacing the Euclidean distance, diverse kernels can be chosen in support of various situations. For clustering, appropriate Gaussian kernel is essential whereas the cloud stores the clustered data towards privacy or security procedures.
For process improvements, dual encryption is considered in the procedure involving big data security. Private information is decided in the direction of second-hand certification procedure in the result to encrypt the information. At present, homomorphic encryption strategy is initialized; however the same strategy is misused after second stage procedure. However, the encrypted information is present in excess effort to OSC process.
Without knowing the private key, the encrypted data operation is executed by utilizing Homomorphic Encryption framework. Here, the secret key holder is a client [
Steps for HE
Four functions are present in Homomorphic Encryption procedures.
Step 1: Key Generation:
Step 2: Select double huge prime numbers
Step 3: Encryption:
Step 4: Compute ciphertext
Step 5: Decryption:
Step 6: Assume
Step 7: Compute plain text
This technique yields similar outcomes after calculation, which it would have acquired, if the technique worked directly on crude information. These encryptions enable complex processes to execute on encrypted information, without compromising the encryption.
When the hardware is connected, lightweight block cipher is executed according to effective hash work. Varying structures of cipher family consist of low functions with various key sizes and block sizes. According to image pixels, every block and its key differ with the estimation of 16 block events that vary in the range of 32 to 128 bits. The cipher content blocks are brought to execute an event on plain text with fixed block size.
Because of security examinations, SIMON cipher comprises of nonlinear attributes which get directed in block data and size. One can think about a tree in which the differences can be utilized in fixed input. Few conceivable output differences are produced at each round with distinction.
A solid structure of round capacity is exploited by one, thereby considering the basic attributes that are prolonged with more rounds on the rotten possibility.
The key scheduled properties are killed with SIMON key calendars by employing round consistency and pixel quantity with regards to images.
Based on lightweight block cipher, single key differential and singular key differential trademarks were found to be 15-round SIMON48.
The block cipher qualities are intended to be excellent. A minimum number of active S-boxes are ensured with quality, since the number finalizes the optimizer which in turn yields the optimum solution.
In wireless networks, cipher model is executed as DI security. There are some conditions associated with the model, such as decryption, encryption models, round and bit. The size {16 to 64} is represented using 2n-bit blocks with SIMON cipher. The following equation explains the same in detail.
‘Round functions’ are the functions of ciphers
Round configuration: The round function utilizes 128-bit of plaintext as inputs in SIMON block cipher. In 68 rounds, the 128-bit cipher messages are produced with 128-bit key. The following steps explain the SIMON encryption operations.
Two arbitrary bits of n-bit words perform the Bitwise AND activity.
The bitwise AND task performs the activity of Bitwise XOR. One arbitrary is XOR-ed with the final value.
Where, rotation count is y by means of
The following equation explains SIMON round capacity for encryption.
From
Each round key produces a key expansion of SIMON cipher from the master. From initial 128-bit master key, a total of 44 32-bit sized round keys is generated by the selected SIMON64/128 configuration. The previously saved
To signify
The key optimization process in SIMON utilizes Krill Herd Optimization (KHO) method [
Objective function for OSC
During decryption process with key K, minimum number data is retrieved as the fitness function. Multiple key sets in TDES process generate the initial solution and the condition is satisfied using an optimal key.
New keys updating process
The discretionary dimensionality enables search by realizing an optimization algorithm. The n-dimensional decision space is used to sum-up the Lagrangian method.
Hence, the ith krill individuals are with physical diffusion,
Movement induced by other krill individuals
The local impact or area swarm thickness settles the course of movement of a krill individual in the advancement. The unpleasant swarm thickness and the objective swarm thickness are explained below:
The representation of
Foraging motion
Two effective parameters are used to figure out a similar scavenging development. The past experience is secondary one, while the initial sentence is the third. At
where, the scavenging velocity is denoted by
Physical diffusion
Physical spread, with krill persons, is believed to be a sporadic procedure. This establishment expresses the degree so as to be a majority disgraceful scattering speed along with an uneven directional vector.
Here, the maximum diffusion speed is
Crossover and mutation
For overall enhancement, the part of GA utilizes the crossover administrator as a suitable procedure. The mutation likelihood (Mr) manages the mutation.
For global best, the mutation probability which utilizes new mutation probability is equivalent to zero. When the fitness value decreases, the global best increases.
Optimal keys are known to be 64 bits long, which are recognized in support of their compatibility whereas the adaptability is transferred in support of SIMON cipher. The anticipated technique encrypts the information to store in the cloud.
In view of the above dual encryption, the input duplicated information is encrypted. After encryption, the document is stored in the cloud which gains a structure with genuine client. Due to the secure double encryption mechanism, the confidentiality of information cannot be inferred directly. The legitimate message authentication or substantial signature is never produced with the advantage of proposed strategy.
The proposed work with clustering model and double encryption was implemented using Java with JDK 1.7.0. The operation framework stage consisted of 1.6 GHz, 4 GB RAM with Intel (R) Core i5 processor configuration in Windows 10 operating system. The datasets, with different medical information, were used via Map-Reduce structure in cloud condition. The following section explains the database and comparative investigation.
From UCI machine learning repository, the big data security procedure was validated by checking the medical database. In total, the maximum size of databases was 1,000,000 including breast cancer and Switzerland databases.
Switzerland database: There were 76 traits present in the database and 14 subsets were utilized by means of distributed assessment. Particularly, ML researchers utilized a special case of database. The presence of coronary heart disease is referred to ‘objective’ field alludes.
Breast cancer database: Dr. Wolberg reported that the sampled clinical cases turned up periodically. The chronological information was gathered according to the database.
Size of the file (MB) | Execution time (ms) | Encryption time (ms) | Decryption time (ms) | Memory (bits) |
---|---|---|---|---|
1 | 37489 | 7865 | 5868 | 1225887 |
2 | 46215 | 13252 | 8564 | 1325588 |
3 | 50235 | 13658 | 11254 | 1579554 |
4 | 56689 | 16524 | 12332 | 1544712 |
5 | 621028 | 18256 | 16235 | 1768525 |
File size (MB) | Execution time (ms) | Encryption time (ms) | Decryption time (ms) | Memory (bits) |
---|---|---|---|---|
1 | 36488 | 7651 | 5469 | 1215784 |
2 | 45215 | 11245 | 8654 | 1325586 |
3 | 48962 | 13658 | 10268 | 1478548 |
4 | 55698 | 15478 | 12447 | 1544754 |
5 | 61025 | 18646 | 14639 | 1655882 |
File size (MB) | Execution time (ms) | Encryption time (ms) | Decryption time (ms) | Memory (bits) |
---|---|---|---|---|
1 | 32092 | 7513 | 3912 | 1214276 |
2 | 42313 | 11103 | 8218 | 1320926 |
3 | 47951 | 13639 | 7596 | 1475240 |
4 | 54759 | 15293 | 7635 | 1540292 |
5 | 56969 | 18234 | 10216 | 1651941 |
File size (MB) | Switzerland data (%) | Breast cancer data (%) | ||
---|---|---|---|---|
Hash function based | Authorized Party based | Hash function based | Authorized party based | |
1 | 44.65 | 49.85 | 44.74 | 42.77 |
2 | 46.28 | 51.82 | 47.11 | 45.80 |
3 | 48.69 | 47.80 | 49.66 | 48.64 |
4 | 46.39 | 49.50 | 46.82 | 45.29 |
5 | 51.15 | 55.15 | 52.05 | 50.51 |
The accuracy analysis of clustering techniques such as FCM and MKFCM for Switzerland database is shown in
The proposed de-duplication method on encrypted big data in cloud computing was analyzed by optimal double encryption approach. The framework reduced the measure of capacity required by the cloud service providers. Notwithstanding the double encryption technique, de-duplication process was proposed in which the hash function was used to transfer the data in an encrypted format. After this, data duplication was checked and other functions such as modification, deletion, and de-duplication of the data were performed. De-duplication process expelled the repetitive blocks. Here, MKFCM clustering model was used to analyze the de-duplication process. MKFCM algorithm identified the best degree of participation and ideal kernel weights for non-negative combination of arrangement of bits. The key optimization process was accomplished by OSC during when the KHO technique was used. In view of the above processes, the proposed technique encrypted the data and stored the data on cloud. The findings of the study infer that the proposed double encryption scheme ensured enhanced the authentication accuracy and security compared to other techniques. However, the authors recommend future researchers to improve the performance of the proposed model using lightweight cryptographic techniques.