The most sensitive Arabic text available online is the digital Holy Quran. This sacred Islamic religious book is recited by all Muslims worldwide including non-Arabs as part of their worship needs. Thus, it should be protected from any kind of tampering to keep its invaluable meaning intact. Different characteristics of Arabic letters like the vowels (
A digital watermark is a digital signal or sample that can also be seen as a sort of digital signature inserted right into digital data for the protection of copyright ownership of digital statistics, including text, image, video, audio, etc; [
Text watermarking has become important due to developments in the field of information hiding. Protection of such information is necessary and useful as the copying of digital media and preserving it unchanged is not easy. Nevertheless, the watermark is considered the best and most perfect application to host secret text, identify the ownership, acquire authentication and keep documents secure. Additionally, many researchers aim to use such watermarking applications so much so that it has become an important and necessary field of study [
There are many methods for embedding secret data in text for different languages each according to an individual language’s features. As the Quran is written in Arabic, one needs to know the features of the language in order to hide secret data within it. In this regard, of the many tasks watermarking can perform, a few of these are used in the “kashida” letter extension (one of the Arabic language’s features) to embed diacritics, a secret message, the order of the words in statement and the conjunctions of the Arabic language as well. Even hyper (the mixing of two or more features) are used to hide a secret message in Quran’s text. Each of these methods has the advantages and disadvantages, for this reason, it needs to come up with a new algorithm in a view of overcoming all the disadvantages of existing methods [
The contributions of this paper is hoped to improve the embedding capacity and imperceptibility by proposing that the Quran text watermarking method be used in preserving the original text and meaning of the Quran which is based on vowels with three different types of characters (
Arabic is used as the principal language in all Arab countries of the Middle East and Northern Africa. It is considered the world’s 5th most influential language. It is central to other languages in the Muslim world such as Farsi (Persian), Urdu, Sindhi and Pashto. Some minority languages in China such as Uighur, Kazakh, and Kirghiz are all written using a modified Arabic script. The Arabic alphabet is actually an abjad which includes 28 characters and has many unique characteristics. For example, Arabic script is always written from right to left, there are no capital letters in this language, and each letter in Arabic takes different forms according to its location in the word. In addition, Arabic has many characters which may have one, two or three dots placed above or below some letters. Each Arabic word contains more than one character that are linked together. This connection feature is useful in terms of data hiding. Furthermore, Arabic contains 15 characters, five of which are multi-point characters, unlike English, which does not contain multiple characters. Meanwhile, each word often has some special marks called “Harakaat” [
These diacritics are represented by several digitally inserted characters as separate letters found inside the computer’s 0 location. The use of the Arabic grammatical markings in the Arabic language is mentioned according to the standards and practices of modern Arabic writing. Therefore, it is necessary for the Quran and most religious texts [
Moreover, the “Kashida” which is used between Arabic characters has a well-known extension character [
A watermarking process based on kashida with a dotting property has been presented in [
A state-of-the-art review of the recent Arabic text watermarking was studied and summarized in this part. Generally, text watermarking is one of the most challenging kinds of watermarking methods. Text watermarking tries to exploit the characteristics of the letters or the entire text in a certain language to hide information. Text watermarking comes with its own difficulties when compared to the watermarking of images or audio as the text usually has a relative lack of redundant information that makes it difficult for hiding data. However, texts differ from one language to another. For instance, the Arabic language has specific characteristics in the context of writing which makes text watermarking much more applicable as compared to other languages [
The Arabic language uses special characters such as kashida, diacritics, special letters and vowels, not to mention special shapes of certain letters to give the context a higher level of clarity as well as making the shape of the context much more artistic. It should be mentioned that the Arabic language has 6 types of artistic writing which can be recolonized from the shape of the letters. These writings were gradually developed through antiquity and are used even until today. For instance, the kashida is a method to lengthen some of the letter in Arabic text and connect it to other letters on the right side (Arabic script is written from right to the left), while some letters have accepted the insertion of the kashida after it [
Many studies have tried to use watermarking in Arabic texts or in the text of the Quran. For example, [
The use of diacritics in Arabic text is necessary for those involved in the text analysis. However, diacritics-based watermarking is seen as a critical issue if the original text has been dispossessed of the diacritics in the detection process. Therefore, being able to perform diacritics-based watermarking at 100% depends on the number of diacritical marks in the text. There are 22 letters in the Arabic language that are usually connected to form a single word. Sometimes a single letter in the Arabic language can be written in four different shapes according to the location of the letter in a certain word as shown above in
A Unicode method suggesting a unique number for each letter in the Arabic language for programing purposes has been developed by [
Conversely, six out of the 28 Arabic letters are categorized as special letters that do not connect with other letters when these letters fall at the beginning or in the middle of the word. These special letters are ((
Some scholars have used the kashida and dotted letters in the Arabic language for coding. An approach based on coding a secret bit ‘one’ in the doted letter followed by Kashida while coding a secret bit ‘0’ with the un-pointed letters followed by the kashida has been presented in [
Dividing the Arabic letters into two sets based on the occurrences of these letters has been presented in [
Hiding information in Arabic text via using sun and moon letters in the Arabic language has been proposed in [
Increasing the capacity of kashida by adding white spaces as an extra redundant cover has been proposed in [
The Holy Quran consists of 30 parts (juz’) which includes 114
The Holy Quran consists of 30 parts (juz’) which includes 114
Many studies have suggested the used of diacritics or kashida instead of vowel characters for hiding information [
Unfortunately, there are no existing studies in which the Arabic vowels
A reverse process or technique means changes to the opposite process. In this context, reversing can be accomplished by writing down the sequence of the process and starting backwards from the end and working your way to the beginning of the text as is necessary in the reverse process. In the watermarking technique, there is a process called embedding when hiding or inserting the data within the cover media in the sender’s part and the other part (the receiver’s part) tries to reverse the process to get the hidden data [
A multilevel histogram technique which is modified for the implementation of vector maps reversible watermarking strategy has been presented in [
A novel reversible data hiding technique in encrypted images is presented by [
Watermarking is the procedure of hiding data into certain media like that found in literature including image, audio, video, and text. Most techniques presented in watermarking utilize images, audio, videos as the cover media. However, the text watermarking technique is interesting using a new algorithm to hide the secret bits within Quranic text. A novel algorithm suggests including four main phases. First, the pre-processing phase that is responsible for preparing the hosting media (such as the Quranic text in the case) and secret bits (data that was hidden in the text). The second phase is the embedding of the secret bits within the Quran text. The third stage involves the extraction of the data including the attack process. The final phase was the performance evaluation of the scheme in terms of various measures. The details of these phases are discussed as follows:
In the proposed watermarking scheme, data pre-processing is performed on the secret watermark and hosting media. The ASCII code is represented in the computer as a hexadecimal system that starts from 0600 and ends with 06FF. In the Quran text, the hexadecimal is used as a location scheme with the base of 16 to describe the numbers. Compared to the traditional method of expressing numbers with ten symbols, it uses sixteen distinct symbols. Most commonly, “0” to “nine” symbols reflect the “0” to “nine” whereas an “A” to “F” to reflect the values from “ten” to “fifteen”. Each character has its hexadecimal number even the characteristics of the language such as kashida and diacritics. When inserting one kashida the statement is increased by one more letter with a representation of a hexadecimal number. In this regard, Quranic text is changed to the ASCII code before embedding so that it can analyze the letters of the Quran into the ASCII as shown in
In addition to the Quran text preparation, the secret bit is also converted into ASCII code and then into the binary code of zero (0) and one (1). To increase the capacity of the secret data embedded, the compression was performed with the secret data. After this process, the pre-processing phase is complete, the Quran text is covered with a secret bit and is now ready for the next phase-embedding.
This phase includes three processes which are locating or positioning the letters, adding the kashida to achieve the embedding, and checking the match of the secret bits with the bits of the Quran text for embedding decision as shown in
Each of the abovementioned processes depends on the previous one in the sequence. For example, the addition of the Kashida cannot happen unless the right position is determined and this phase involves the following three sub-processes:
The Quran text consists of the statements called
The extension letter or Kashida occurs when the bit needed to be hidden or embedded equals one (1). When a secret bit is equal to 0 then there is no Kashida. The addition of the kashida to the word does not change the meaning of the word. The mechanism of adding Kashida is to extend (by Kashida) the same vowels if possible or if next letter to it represents (1) bit of secret bits. In the same context, the absence of kashida in these positions represents 0 of the secret bit as shown in
Embedding involves three sub-processes: first, the embedding of bit (1) was done after the vowels wherein Kashida was inserted to represent a bit (1) similar to the vowels in the second word. Second, in the last vowels, the Kashida was inserted into the vowel itself (meaning the letter
Increasing the complexity of the techniques to make it more difficult to solve, the hidden serial bits must be manipulated before embedding. In this regard, bits of 0’s are considered without affecting the words or verses. However, the expected redundancy of the embedding bits should be avoided. To proceed from the reversing, the most similar secret bits with cover text bits must be considered.
As displayed in
The main purpose of performing changes or using a reverse process is to reduce the modification of the text as much as possible. More changes in Quran’s text may be noticeable or recognizable by the intruder. Keep in mind that the imperceptibility is the measurement of how much the host media is changed and how much of the difference between the change and the original media is noticeable. By using this method, more security can be obtained because of the robustness of the proposed algorithm making it almost very difficult unless finding out the key. In addition, increasing authenticity is one of the objectives of watermarking text. Regarding
The extraction process is necessary to remove the hidden data from the watermarked text. This process is exactly the reverse of the embedding stage. First, the receiver needs a watermarked Quran text. In addition to the original hosting media, the secret bit is needed due to it is containing the most important information for extraction. Without the secret bit, the receiver cannot extract any information and then everything is for naught. The extracting procedure of the proposed approach is accomplished by looking for the vowels in order to extract the watermarked text from it. It can follow the color of each process as shown in
This process is exactly the reverse of the embedding procedure.
Otherwise, the extracting algorithm of the proposed approach starts to read again.
The implementation of the proposed algorithms has been achieved via using MATLAB software. In order to evaluate the performance of the proposed approach, it has been considered the standard and authentic version of Quran datasets from
The main aim of the research is to improve the watermarking in terms of capacity. Capacity is determined by the embedding ratio and the efficiency ratio features. Embedding ratio refers to the amount of secret data embedded concerning the available places in the text file. On the other hand, the embedding ratio is considered the total satisfying conditions bit not the total size of the text file. Therefore, embedding ratio (ER) is important criteria in watermarking and steganography system. Embedding ratio can be defined according to
Following the above equation, it’s clear that the main factor that affects the error rate is the total letters of the cover file. A big file size produces a less error rate. Regarding this fact, the embedding ratio will decrease so the percentage which is added to the file size is naturally lower as well. In this regard, it is important to mention the inverse method which is implemented to decrease the embedded watermarked letters for keeping the numerator high as much as possible ((Total letter of cover text (letters of the embedded watermarked)) in
In addition, applying the reversing technique will help the embedding ratio due to decrease the distortion of the cover text which can help on the other side to embed more amount of the secret bits. However, the embedding ration results show an exponential impact as noticed in
In light of view the benchmarking of the proposed method, a significant criterion has been achieved based on embedding ration for portraying the power of the proposed strategy as well as make it easy to compare with the state of art relevant works. Along these lines, the comparative analysis has been accomplished in order to come up with the improvement of the proposed algorithm and riding of the drawback outcomes of the relevant methods as well.
Methods | Capacity evaluation (ER) (%) |
---|---|
[ |
61.6 |
[ |
73.4 |
Proposed method | 90.58 |
In accordance to the results which is illustrated in
This proves that the proposed model can achieve relatively high concealment and hiding capacity at the same time by adjusting the average bit number of each word which has been embedded. The proposed method using vowels that can add a kashida after each these letters (
On the other hand, the distribution of the embedded secret bit to cover Quran text reflects the Peak Signal to Noise Ratio (PSNR) value, while the PSNR detect the frequency of the bits within cover Quran text. When the bits become heterogeneous or in other words more chaotic, then the PSNR will get high value. It has been proposed the reversing technique for embedding procedures. High PSNR reflects good Quran text quality so all previous methods try to increase the PSNR [
Subjective methods depend on the observation of humans and judgment which means without using any reference criteria. PSNR used to measure the quality of Quran text after embedding and defined according to
With,
MAX is the maximum possible letters value of the text; m, n is the range of the whole text and number of
In light of view the proposed system which has been implemented via utilizing the reversing technique, a high impact has been carried out especially when the cover text size is big enough. Therefore, the file size 16 KB obtains a PSNR of 64.28 dB without accomplishing the reversing mechanism of the embedding procedures. On the other side, through using the same cover text size and hiding the same secret bits with the aid of the reversing technique, the PSNR is equal to 69.21 dB as demonstrated in
Regarding
One of the inverse relationships is the relation between the number of pixels in the logo and the imperceptibility. This inverse relationship will appear clearly at the time of increasing the number of logo pixels which is affecting on the distortion of the image and imperceptibility will be decreased slightly.
According to the results which have been investigated in
Watermark/byres | Size of text file/kb | Imperceptibility PSNR/db | Embedding Ratio/ER/% |
---|---|---|---|
25 | 4 | 66.11 | 9.09 |
36 | 16 | 69.21 | 9.11 |
48 | 21.3 | 70.33 | 9.21 |
52 | 23.2 | 71.24 | 9.24 |
Regarding the results of the proposed method, it has been achieved a good imperceptibility (PSNR) compared with the existing state of art methods. The promising results have been accomplished due to eliminating the disadvantages and constraints of the existing methods through implementing the vowels and the reversing technique as well. In terms of the proposed strategy benchmarking,
Methods | Imperceptibility and Security Evaluation (PSNR) |
---|---|
[ |
62.46 dB |
[ |
70.88 dB |
Proposed Method | 72.33dB |
This paper proposes a new approach for improving the embedding capacity and impressibility. The proposed strategy has been achieved via implementing the text watermarking method based on vowels with three different types of characters (
The future direction of this work will be focusing on the strengthening of the verification phase by working on the boundaries of the proposed approach and expanding it to resolve more complex patterns. There are still a lot of issues that need to be addressed. First, the availability of digital Quran in different patterns is another pressing search problem. It would be interesting to expand the proposed approach to validate the other patterns as well. Second, it will be interesting to work on improving security and evaluating its accuracy in large data sets. Furthermore, our immediate goal is to make this web-based system as much as the public and extend the platform based on mobile users.
The authors would like to thank those who contributed to the article and who support them from Universiti Teknologi Malaysia (UTM) for their educational.