In the design of hearing aids (HA), the real-time speech-enhancement is done. The digital hearing aids should provide high signal-to-noise ratio, gain improvement and should eliminate feedback. In generic hearing aids the performance towards different frequencies varies and non uniform. Existing noise cancellation and speech separation methods drops the voice magnitude under the noise environment. The performance of the HA for frequency response is non uniform. Existing noise suppression methods reduce the required signal strength also. So, the performance of uniform sub band analysis is poor when hearing aid is concern. In this paper, a speech separation method using Non-negative Matrix Factorization (NMF) algorithm is proposed for wavelet decomposition. The Proposed non-uniform filter-bank was validated by parameters like band power, Signal-to-noise ratio (SNR), Mean Square Error (MSE), Signal to Noise and Distortion Ratio (SINAD), Spurious-free dynamic range (SFDR), error and time. The speech recordings before and after separation was evaluated for quality using objective speech quality measures International Telecommunication Union -Telecommunication standard ITU-T P.862.
About 500 million people were suffering from hearing loss. Their quality of life can be enhanced by using Digital Hearing Aid. The hearing aid is used by only 30% of the patients. This percentage can be increased by designing hearing aid device with low noise and improved sound quality. Obviously, the cost is the important factor too. The different hearing aids based on placement Behind-The-Ear (BTE) HA, Receiver-In-Canal (RIC) HA, In-The-Canal (ITC) HA and the In-The-Ear (ITE) HA, have same structures for collection and sound regeneration. The Main functions of hearing aid are shown in
The Hearing aid should recognize the speech signals out of the environmental noise. Some of the noise is produced by speech babbles, instrumentation noise and other unnecessary sounds. Too much of reverberation will reduce speech intelligibility and the overall sound quality. If the noise magnitude is more than the voice, the efficiency of speech processing unit will be poor. So, the alternative method can be developed for noise reduction in hearing aid. The Noise is removed independently in binaural. Hearing aid (HA), which differs from the two systems since it processes noise independently. In spite of the advances in Hearing aid (HA) technology, improving the speech intelligibility is a challenge. In recent years, hearing aids are connected to android, iPhone Operating System (IOS), and Bluetooth-enabled phones. In most cases the collected sound is directional. If the noise strength is more compared to the required signal, it’s very difficult to remove the same. In addition, the frequency response of hearing aids should be uniform throughout the audio spectrum. The Performance of the HA for frequency response is non uniform. Existing noise suppression methods reduce the required signal strength also along with noise. So the performance of uniform sub band analysis is poor when hearing aid is concern. On the other hand, source separation is one of the important issues in the Digital Hearing Aid (DHA). The Microphone in the Digital Hearing Aid (DHA) continuously receives the ‘N’ number of incoming speech signals. The Hearing-impaired people couldn’t understand the collapsed incoming speech signals.
In this research work, sub band analysis using Fejer-Korovkin (FK) wavelet based decomposition Methodology is proposed. The Wavelet based de-noising algorithm minimizes the Gaussian noises present in the input signal. If the number of stages in the filter bank is more, the rejection ratio will be more. It is better in performance when compared to the existing two stage wavelet filter bank and tree structured wavelet filter banks. For speech separation Non-negative Matrix Factorization (NMF) algorithm is used. The Proposed filter bank architecture is compared with other architectures. The Proposed methods Error and time calculation for different divergence are measured. For evaluation of the proposed method and existing method, mixed audio sources were used for speech separation. Different parameters like Signal to Noise Ratio (SNR), Signal to Distortion Ratio (SDR), Signal to Noise and Distortion Ratio (SINAD), Spurious-free dynamic range (SFDR), Mean Square Error (MSE), and Band Power were used for evaluation.
The Filter bank is an important functional block in digital hearing aids that decomposes the input signal into different bands [
The Discrete input signal is applied to filter bank to produce a set of sub-band signals [
For speech processing two band ortho normal wavelet is used which can be associated with ortho normal filter bank [ In intra- and inter-scales, the wavelets are ortho-normal as, The corresponding scaling function of wavelet theory has only intra-scale ortho-normality as, The corresponding property for all values of m, n, m′, and n′ in wavelet and scaling bases is given in
In
The wavelet δ(t) and scaling functions β(t) are build to maintain ortho-normality. The filter function h0(n) and hl(n) have finite duration [
In tree-structured filter banks, the inputs pass through two or more filters and the output is downsampled [
Here j represents the unit imaginary.
Among the wide variety of sound separation algorithms, the unsupervised Non-negative Matrix Factorization (NMF) dictionary learning algorithm suits well for the delineation of sound mixture [
The Speech regeneration may be the right choice of NMF algorithm which includes the Euclidian distance (β = 2) [
where X is the Hadamard product.
The Proposed speech separation method using filter bank and NMF is shown in
The Fejer-Korovkin (FK) wavelet is used to design the proposed filter bank architecture which is shown in
In
The
With
We define the Fejer-Korovkin filters by
The filter bank plot of |m0n|2 for n = 2, 4,….12 with following
The
The Non-negative Matrix Factorization (NMF) is used for blind source separation which is closely approximated by a constant frequency with the magnitude spectrogram X of the mixture. The corresponding audio source is separated into I channels with their corresponding spectrograms Ci, 1 ≤ i ≤ I. This Algorithm is based on vector Bi and a time varying gain Gi of the single speech. In
In
The
The Proposed clustering method can be used to cluster any number of sub clusters. For more independent sources the clustering can be done using hierarchical clustering. Here two clusters are created for N number of channels. The clusters m, m∼ ∈ {1, 2} are separated using the vectors a∼, a∼(i) ∈ {1, 2}. The estimated energy E∼m∼ of the spectrograms of both clusters are given by
For uncorrelated sources, the energy is estimated from the mixture signals. Further we assume that one cluster corresponds to one source, and the other cluster contains the remaining sources. Therefore, we expect that the first separated source esm1 corresponds to the cluster with lowest energy because the other cluster corresponds to multiple sources:
The Process repeats until all channels are clustered into two. The process terminates once the sources are clustered.
Wavelet used | Separated speech | Frequency (Hz) | SNR | SDR | MSE | Band-power |
---|---|---|---|---|---|---|
db2 | B1 | 2500 | −6.8218 | −6.8243 | 0.1187 | 0.0845 |
B2 | 6000 | −13.105 | −13.106 | 0.0351 | 0.0008 | |
haar | B1 | 2500 | −6.7904 | −6.7929 | 0.0343 | 0.0832 |
B2 | 6000 | −6.7390 | −6.7398 | 0.0343 | 0.0021 | |
coif1 | B1 | 2500 | −6.8221 | −6.8251 | 0.0343 | 0.0845 |
B2 | 6000 | −15.946 | −15.949 | 0.0343 | 0.0008 | |
sym2 | B1 | 2500 | −6.8218 | −6.8243 | 0.1187 | 0.0845 |
B2 | 6000 | −13.105 | −13.106 | 0.0351 | 0.0008 | |
dmey | B1 | 2500 | −6.8261 | −6.8287 | 0.0343 | 0.0845 |
B2 | 6000 | −20.090 | −20.094 | 0.0343 | 0.0006 |
Wavelet used | Separated speech bands | Frequency (Hz) | Band power | SNR | MSE |
---|---|---|---|---|---|
db2 | B1 | 16000 | 2.2531e−04 | −23.4695 | 0.0104 |
B2 | 8000 | 0.0070 | −18.3915 | 0.0220 | |
B3 | 4500 | 0.0422 | −16.8541 | 0.0504 | |
B4 | 1200 | 0.0084 | −6.5370 | 0.0098 | |
B5 | 250 | 0.0198 | −6.4164 | 0.0206 | |
haar | B1 | 16000 | 5.3615e−04 | −22.2270 | 0.0107 |
B2 | 8000 | 0.0083 | −18.4683 | 0.0233 | |
B3 | 4500 | 0.0358 | −14.4176 | 0.0445 | |
B4 | 1200 | 0.0121 | −11.4527 | 0.0133 | |
B5 | 250 | 0.0178 | −4.7190 | 0.0196 |
The Analysis of the proposed filter bank using Fk wavelet is done by evaluating the spectrum of noisy signal and input
Wavelet type | Band power (dB) | |||||||
---|---|---|---|---|---|---|---|---|
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | |
db5 | 19.8 | 19.9 | 19.5 | 19.7 | 19.3 | 19.8 | 13.0 | 5.1 |
db40 | 19.6 | 19.9 | 19.6 | 19.9 | 19.6 | 19.9 | 14.3 | 5.0 |
sym13 | 19.8 | 19.9 | 19.6 | 19.8 | 19.7 | 19.8 | 13.6 | 5.0 |
sym21 | 19.7 | 19.9 | 19.6 | 19.8 | 19.6 | 19.8 | 14.0 | 5.0 |
coif1 | 19.6 | 19.9 | 19.5 | 19.6 | 18.6 | 18.6 | 11.9 | 5.4 |
dmey | 19.6 | 19.9 | 19.6 | 19.9 | 19.6 | 19.8 | 14.3 | 5.0 |
fk14 | 19.8 | 19.9 | 19.7 | 19.9 | 19.6 | 19.7 | 13.9 | 5.0 |
fk18 | 19.7 | 19.8 | 19.6 | 19.7 | 19.5 | 19.7 | 14.3 | 4.9 |
fk22 | 19.8 | 20.0 | 19.6 | 19.8 | 19.7 | 19.7 | 14.3 | 5.0 |
Wavelet type | SNR (dB) | |||||||
---|---|---|---|---|---|---|---|---|
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | |
db5 | 9.0 | 18.5 | 18.5 | 19.0 | 13.8 | 18.2 | 4.2 | 6.0 |
db40 | 20.2 | 20.0 | 20.2 | 17.1 | 17.8 | 18.3 | 4.2 | 5.7 |
sym13 | 20.8 | 19.7 | 19.5 | 17.9 | 16.8 | 18.6 | 4.2 | 5.6 |
sym21 | 19.6 | 20.5 | 18.7 | 19.6 | 18.1 | 17.4 | 4.0 | 5.7 |
coif1 | 17.9 | 19.6 | 18.4 | 18.9 | 9.0 | 8.7 | 4.9 | 6.6 |
dmey | 18.1 | 17.9 | 20.1 | 19.6 | 17.0 | 18.0 | 4.3 | 5.7 |
fk14 | 20.9 | 20.8 | 20.4 | 21.0 | 19.2 | 20.2 | 5.6 | 6.8 |
fk18 | 20.9 | 20.6 | 20.6 | 20.6 | 19.7 | 19.1 | 5.3 | 6.7 |
fk22 | 20.9 | 20.8 | 20.8 | 20.3 | 19.1 | 20.9 | 5.0 | 6.7 |
Wavelet type | MSE | |||||||
---|---|---|---|---|---|---|---|---|
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | |
db5 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.10 | 0.36 |
db40 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.09 | 0.37 |
sym13 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.09 | 0.36 |
sym21 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.09 | 0.37 |
coif1 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.07 | 0.12 | 0.38 |
dmey | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.09 | 0.37 |
fk14 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.08 | 0.34 |
fk18 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.09 | 0.34 |
fk22 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.06 | 0.09 | 0.34 |
Wavelet type | SINAD (dB) | |||||||
---|---|---|---|---|---|---|---|---|
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | |
db5 | 19.0 | 18.5 | 18.5 | 19.0 | 13.8 | 18.2 | 4.2 | 6.0 |
db40 | 20.2 | 20.0 | 20.2 | 17.1 | 17.8 | 18.3 | 4.2 | 5.7 |
sym13 | 20.8 | 19.7 | 19.5 | 17.9 | 16.8 | 18.6 | 4.2 | 5.6 |
sym21 | 19.6 | 20.5 | 18.7 | 19.6 | 19.2 | 17.4 | 4.0 | 5.7 |
coif1 | 17.9 | 19.6 | 18.4 | 18.9 | 19.0 | 18.7 | 4.9 | 6.6 |
dmey | 18.1 | 18.0 | 20.1 | 19.6 | 17.0 | 18.0 | 4.3 | 5.7 |
fk14 | 20.9 | 20.7 | 20.4 | 20.2 | 19.2 | 20.2 | 5.6 | 6.8 |
fk18 | 20.9 | 20.6 | 20.6 | 20.6 | 19.7 | 20.1 | 5.3 | 6.7 |
fk22 | 20.9 | 20.8 | 20.8 | 20.3 | 19.2 | 20.9 | 5.0 | 6.7 |
Wavelet type | SFDR | |||||||
---|---|---|---|---|---|---|---|---|
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | |
db5 | 1.36 | 1.03 | 2.36 | 1.56 | 3.94 | 0.60 | 7.14 | 5.36 |
db40 | 0.27 | 0.45 | 0.57 | 2.42 | 2.56 | 0.64 | 6.45 | 1.97 |
sym13 | 0.35 | 1.03 | 2.00 | 1.46 | 4.64 | 0.58 | 5.99 | 5.71 |
sym21 | 2.11 | 1.09 | 0.69 | 0.41 | 1.81 | 3.12 | 5.94 | 5.43 |
coif1 | 2.21 | 0.48 | 1.95 | 0.09 | 4.01 | 4.10 | 2.49 | 5.08 |
dmey | 1.69 | 1.60 | 0.79 | 1.05 | 0.90 | 2.76 | 6.12 | 5.44 |
fk14 | 2.40 | 1.23 | 2.38 | 2.50 | 2.60 | 3.81 | 7.81 | 6.09 |
fk18 | 2.94 | 1.83 | 2.38 | 2.46 | 2.90 | 3.15 | 7.34 | 5.81 |
fk22 | 2.65 | 1.62 | 2.42 | 2.77 | 2.80 | 3.12 | 7.25 | 5.87 |
The Sources in the hearing aid system are statistically independent and the linear mixtures are separated using Independent component analysis (ICA) and Blind Source Separation (BSS). The Experimental setup uses two speech signal and noise. It is a microphone which records with proximities a1 and b1 for male and female voices respectively. Hence they can be source separated using ICA. The linear mixture observed data is given as
The Unmixing matrix has the approximate value of A−1 so that
The Input signals are analyzed using the Non-negative Matrix Factorization (NMF) algorithm and the source matrix are generated W and H. Error and time calculation of Proposed NMF source separation algorithm with different divergence are presented in
S.No | Divergence | Error | Time (S) |
---|---|---|---|
1 | nmf_kl | 8060.38 | 0.53 |
2 | nmf_kl_ns | 7864.43 | 0.49 |
3 | nmf_kl_loc | 39863.18 | 0.52 |
4 | nmf_kl_con | 7864.55 | 0.50 |
5 | nmf_euc_orth | 108743.00 | 0.66 |
6 | nmf_euc | 104976.70 | 1.41 |
7 | nmf_convex | 119376.86 | 0.44 |
8 | nmf_beta (β = 0) | 9870.46 | 1.59 |
9 | nmf_amari | 5856.42 | 0.79 |
10 | nmf_euc_sparse_es | 270814.32 | 0.45 |
The performance comparison of the existing and proposed method is shown in
Band power (dB) | Signal-to-noise ratio (SNR) (dB) | Mean square error (MSE) | Signal to noise and distortion ratio (SINAD) (dB) | Spurious-free dynamic range(SFDR) | |
---|---|---|---|---|---|
db5 | 5.1379 | 6.0394 | 0.3599 | 6.0394 | 5.3621 |
db40 | 4.9724 | 5.6697 | 0.3696 | 5.6697 | 1.9688 |
sym13 | 5.0347 | 5.5948 | 0.3642 | 5.5948 | 5.7113 |
sym21 | 4.9620 | 5.6577 | 0.3727 | 5.6577 | 5.4252 |
coif1 | 5.4415 | 6.5812 | 0.3770 | 6.5812 | 5.0758 |
dmey | 4.9639 | 5.7350 | 0.3706 | 5.7350 | 5.4405 |
Proposed fk14 | 4.9732 | 6.7527 | 0.3393 | 6.7527 | 6.0928 |
Proposed fk18 | 4.9308 | 6.7338 | 0.3409 | 6.7338 | 5.8103 |
Proposed fk22 | 4.9805 | 6.6878 | 0.3373 | 6.6878 | 5.8748 |
The Speech recordings before and after separation are evaluated for quality using objective speech quality measures such as ITU-T P.862 for objectivity
Existing wavelet + non-negative matrix factorization (NMF) separation | Proposed wavelet + |
||||||||
---|---|---|---|---|---|---|---|---|---|
S.No | db5 | db40 | sym13 | sym21 | coif1 | dmey | fk14 | fk18 | fk22 |
Source 1 | 1.5912 | 1.3339 | 1.3922 | 1.4719 | 0.8961 | 1.2298 | 2.8356 | 2.0573 | 2.2219 |
Source 2 | 2.2007 | 0.9160 | 1.6902 | 1.0330 | 2.0850 | 1.7251 | 1.8767 | 2.6213 | 2.7645 |
The paper presents a new method for speech separation in Hearing aids which provide high signal-to-noise ratio. The Wavelet based decomposition Methodology using Fejer-Korovkin (FK) algorithm is better in performance when compared to the existing decomposition in two stage wavelet filter bank and tree structured wavelet filter bank db. For speech separation Non-negative Matrix Factorization (NMF) algorithm is used. The proposed methods Error and time calculation for different divergence are measured. For evaluation of the proposed method and existing method on speech separation mixed audio sources are used and ITU standard P.862 is utilized for the evaluation. Different parameters like Signal to Noise Ratio (SNR), Signal to Distortion Ratio (SDR), Signal to Noise and Distortion Ratio (SINAD), Spurious-free dynamic range (SFDR), Mean Square Error (MSE), and Band Power were used for evaluation. In future deep learning methods will be proposed for this application. The hardware implementation will be carried out using new semiconductor devices.
We acknowledge our family friends and organization for their support in carrying the research work.