A new three-parameter discrete distribution called the zero-inflated cosine geometric (ZICG) distribution is proposed for the first time herein. It can be used to analyze over-dispersed count data with excess zeros. The basic statistical properties of the new distribution, such as the moment generating function, mean, and variance are presented. Furthermore, confidence intervals are constructed by using the Wald, Bayesian, and highest posterior density (HPD) methods to estimate the true confidence intervals for the parameters of the ZICG distribution. Their efficacies were investigated by using both simulation and real-world data comprising the number of daily COVID-19 positive cases at the Olympic Games in Tokyo 2020. The results show that the HPD interval performed better than the other methods in terms of coverage probability and average length in most cases studied.
Over-dispersed count data with excess zeros occur in various situations, such as the number of torrential rainfall incidences at the Daegu and the Busan rain gauge stations in South Korea [
The main idea of a ZI model is to add a proportion of zeros to the baseline distribution [
The cosine geometric (CG) distribution, a newly reported two-parameter discrete distribution belonging to the family of weighted geometric distributions [
Statistical tools such as confidence intervals provide more information than point estimation and
Besides the principal method involving maximum likelihood estimation widely used to estimate parameters in ZI count models, Bayesian analysis is also popular. For example, Cancho et al. [
Motivated by these previous studies, we herein propose Wald confidence intervals based on maximum likelihood estimation, Bayesian credible intervals, and highest posterior density (HPD) intervals for the three parameters of a ZICG distribution. Both simulated data and real-world data were used to compare the efficacies of the proposed methods for constructing confidence intervals via their coverage probabilities and average lengths.
The CG distribution is a two-parameter discrete distribution belonging to the family of weighted geometric distributions [
If
This section provides the cumulative distribution function (CDF), moment generating function (MGF), mean, and variance of a ZICG distribution, which are derived from the CG distribution [
Since the explicit expression for the moment using equality is
Since
Since
The index of dispersion (
The values of
0.1 | 1.2456 | 1.1078 | 1.7272 | 2.7955 | 1.7121 | 2.7514 | 11.4168 | 13.8715 | 11.3870 |
0.2 | 1.2491 | 1.1185 | 1.7296 | 2.8589 | 1.7963 | 2.8172 | 12.2676 | 14.5714 | 12.2392 |
0.3 | 1.2525 | 1.1293 | 1.7319 | 2.9222 | 1.8804 | 2.8831 | 13.1184 | 15.2713 | 13.0915 |
0.4 | 1.2560 | 1.1401 | 1.7343 | 2.9855 | 1.9645 | 2.9490 | 13.9691 | 15.9712 | 13.9437 |
0.5 | 1.2595 | 1.1508 | 1.7367 | 3.0489 | 2.0487 | 3.0149 | 14.8199 | 16.6711 | 14.7960 |
0.6 | 1.2629 | 1.1616 | 1.7390 | 3.1122 | 2.1328 | 3.0808 | 15.6707 | 17.3710 | 15.6482 |
0.7 | 1.2664 | 1.1724 | 1.7414 | 3.1756 | 2.2169 | 3.1466 | 16.5215 | 18.0708 | 16.5004 |
0.8 | 1.2699 | 1.1831 | 1.7437 | 3.2389 | 2.3010 | 3.2125 | 17.3723 | 18.7707 | 17.3527 |
0.9 | 1.2733 | 1.1939 | 1.7461 | 3.3023 | 2.3852 | 3.2784 | 18.2231 | 19.4706 | 18.2049 |
The likelihood function of the ZICG distribution is
In the case of a single homogeneous sample (
Here, we have
This provides the closed-form expression for
In this study, we assume that there is more than one unknown parameter. Meanwhile, the assumed parameter vector is
Hence, the
Suppose
Let
Since the elements in set
Thus, the likelihood function based on augmented data
Since there is no prior information from historic data or previous experiments, we use the noninformative prior for all of the parameters. The prior distributions for
Since the joint posterior distribution in
Thus, the marginal posterior distribution of
Here, we applied the random-walk Metropolis (RWM) algorithm to generate
The process proceeds as follows:
Choose trial position Calculate Generate If
The Gibbs’ sampling steps are as follows:
The HPD interval is the shortest Bayesian credible interval containing The density for each point inside the interval is greater than that for each point outside of it. For a given probability (say 1 −
Bayesian credible intervals can be obtained by using the MCMC method [
Coverage probabilities and average lengths were used to compare the efficacies of the confidence intervals. Suppose the nominal confidence level is
Sample size
n | Method | CPs | ALs | |||||||
---|---|---|---|---|---|---|---|---|---|---|
50 | 0.5 | 1 | 0.1 | Wald | 0.527 | 0.809 | 0.026 | 114.231 | 47.931 | 2.743 |
Bayesian | 0.732 | 0.874 | 0.562 | 0.177 | 4.043 | |||||
HPD | 0.748 | 0.946 | 0.508 | 0.168 | 3.928 | |||||
0.5 | Wald | 0.5812 | 0.882 | 0.096 | 143.937 | 85.863 | 26.715 | |||
Bayesian | 0.875 | 0.866 | 0.752 | 0.283 | 4.020 | |||||
HPD | 0.941 | 0.819 | 0.946 | 0.706 | 0.267 | 3.908 | ||||
0.9 | Wald | 0.754 | 0.925 | 0.637 | 82.840 | 59.022 | 9.714 | |||
Bayesian | 0.921 | 0.844 | 0.864 | 0.886 | 0.512 | 3.985 | ||||
HPD | 0.855 | 0.65 | 0.897 | 0.843 | 0.469 | 3.836 | ||||
50 | 0.9 | 1 | 0.1 | Wald | 0.544 | 0.556 | 0.031 | 15865.933 | 512.228 | 0.011 |
Bayesian | 0.878 | 0.2507 | 0.111 | 3.878 | ||||||
HPD | 0.226 | 0.111 | 3.722 | |||||||
0.5 | Wald | 0.402 | 0.400 | 0.007 | 5550.402 | 483.188 | 0.019 | |||
Bayesian | 0.906 | 0.375 | 0.149 | 3.962 | ||||||
HPD | 0.371 | 0.148 | 3.863 | |||||||
0.9 | Wald | 0.722 | 0.722 | 0.027 | 790.778 | 544.465 | 0.136 | |||
Bayesian | 0.886 | 0.291 | 0.289 | 3.951 | ||||||
HPD | 0.948 | 0.267 | 0.280 | 3.823 | ||||||
50 | 0.5 | 3 | 0.1 | Wald | 0.872 | 0.819 | 0.004 | 174.914 | 65.643 | 0.871 |
Bayesian | 0.673 | 0.369 | 0.163 | 4.646 | ||||||
HPD | 0.652 | 0.323 | 0.160 | 4.460 | ||||||
0.5 | Wald | 0.911 | 0.024 | 59.560 | 20.094 | 1.538 | ||||
Bayesian | 0.804 | 0.896 | 0.592 | 0.204 | 4.447 | |||||
HPD | 0.645 | 0.850 | 0.537 | 0.195 | 4.300 | |||||
0.9 | Wald | 0.631 | 97.290 | 28.399 | 69.825 | |||||
Bayesian | 0.634 | 0.642 | 0.889 | 0.385 | 4.075 | |||||
HPD | 0.528 | 0.452 | 0.851 | 0.351 | 3.947 | |||||
50 | 0.9 | 3 | 0.1 | Wald | 0.487 | 0.393 | 0.000 | 13654.005 | 502.619 | 0.014 |
Bayesian | 0.240 | 0.116 | 4.720 | |||||||
HPD | 0.215 | 0.115 | 4.525 | |||||||
0.5 | Wald | 0.438 | 0.364 | 0.002 | 4698.163 | 463.454 | 0.027 | |||
Bayesian | 0.937 | 0.390 | 0.158 | 4.694 | ||||||
HPD | 0.921 | 0.383 | 0.157 | 4.518 | ||||||
0.9 | Wald | 0.775 | 0.712 | 0.013 | 628.775 | 428.654 | 0.242 | |||
Bayesian | 0.889 | 0.815 | 0.411 | 0.318 | 4.060 | |||||
HPD | 0.892 | 0.783 | 0.383 | 0.308 | 3.939 | |||||
100 | 0.5 | 1 | 0.1 | Wald | 0.480 | 0.694 | 0.005 | 116.671 | 55.914 | 0.211 |
Bayesian | 0.765 | 0.807 | 0.465 | 0.134 | 3.935 | |||||
HPD | 0.773 | 0.903 | 0.418 | 0.128 | 3.774 | |||||
0.5 | Wald | 0.485 | 0.778 | 0.016 | 90.840 | 67.565 | 1.003 | |||
Bayesian | 0.936 | 0.846 | 0.810 | 0.666 | 0.230 | 3.966 | ||||
HPD | 0.873 | 0.801 | 0.898 | 0.625 | 0.221 | 3.841 | ||||
0.9 | Wald | 0.707 | 0.767 | 0.397 | 351.284 | 6417.277 | 15.518 | |||
Bayesian | 0.926 | 0.821 | 0.841 | 0.756 | 0.484 | 3.991 | ||||
HPD | 0.877 | 0.688 | 0.909 | 0.700 | 0.454 | 3.856 | ||||
100 | 0.9 | 1 | 0.1 | Wald | 0.480 | 0.483 | 0.011 | 13083.192 | 408.086 | 0.006 |
Bayesian | 0.899 | 0.202 | 0.082 | 3.855 | ||||||
HPD | 0.927 | 0.187 | 0.082 | 3.679 | ||||||
0.5 | Wald | 0.380 | 0.387 | 0.007 | 4091.424 | 341.966 | 0.011 | |||
Bayesian | 0.867 | 0.263 | 0.111 | 3.876 | ||||||
HPD | 0.260 | 0.111 | 3.716 | |||||||
0.9 | Wald | 0.640 | 0.665 | 0.019 | 700.136 | 561.476 | 0.059 | |||
Bayesian | 0.947 | 0.908 | 0.159 | 0.216 | 3.985 | |||||
HPD | 0.152 | 0.214 | 3.888 | |||||||
100 | 0.5 | 3 | 0.1 | Wald | 0.725 | 0.720 | 0.006 | 88.315 | 31.167 | 0.325 |
Bayesian | 0.885 | 0.286 | 0.152 | 4.706 | ||||||
HPD | 0.866 | 0.250 | 0.149 | 4.510 | ||||||
0.5 | Wald | 0.895 | 0.830 | 0.008 | 52.324 | 19.127 | 1.030 | |||
Bayesian | 0.786 | 0.908 | 0.570 | 0.206 | 4.579 | |||||
HPD | 0.643 | 0.865 | 0.526 | 0.199 | 4.404 | |||||
0.9 | Wald | 0.927 | 0.889 | 0.301 | 116.852 | 50.701 | 14.860 | |||
Bayesian | 0.568 | 0.525 | 0.848 | 0.349 | 4.130 | |||||
HPD | 0.456 | 0.404 | 0.811 | 0.327 | 4.020 | |||||
100 | 0.9 | 3 | 0.1 | Wald | 0.401 | 0.276 | 0.000 | 10599.191 | 376.330 | 0.008 |
Bayesian | 0.190 | 0.088 | 4.717 | |||||||
HPD | 0.908 | 0.175 | 0.087 | 4.513 | ||||||
0.5 | Wald | 0.380 | 0.298 | 0.001 | 3484.650 | 334.218 | 0.012 | |||
Bayesian | 0.252 | 0.116 | 4.718 | |||||||
HPD | 0.250 | 0.115 | 4.519 | |||||||
0.9 | Wald | 0.755 | 0.700 | 0.005 | 489.154 | 330.362 | 0.089 | |||
Bayesian | 0.927 | 0.901 | 0.207 | 0.237 | 4.346 | |||||
HPD | 0.938 | 0.894 | 0.196 | 0.235 | 4.235 |
For sample size
Data for new daily COVID-19 cases during the Tokyo 2020 Olympic Games from 01 July 2021 to 12 August 2021 were used for this demonstration. The data are reported by the Tokyo Organizing Committee on the Government website (
The number of COVID-19 positive case | 0 | 1 | 2 | 3 | 4 | 6 | 7 | 8 | 9 | 10 | 12 | 15 | 16 | 17 | 18 | 19 | 21 | 22 | 24 | 26 | 27 | 28 | 29 | 31 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 9 | 2 | 4 | 2 | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 2 | 2 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 1 |
The information in
Mean | Variance | SD | Skewness | Kurtosis | ID | |
---|---|---|---|---|---|---|
43 | 10.6744 | 103.7010 | 10.1834 | 0.5478 | 1.9347 | 9.7149 |
Distribution | ZICG | ZIG | ZIP | ZINB | CG | Geometric | Poisson | NB | Gaussian |
---|---|---|---|---|---|---|---|---|---|
142.0583 | 143.2602 | 217.0000 | 142.8000 | 146.3532 | 146.7716 | 302.7065 | 145.1130 | 160.3010 | |
AIC | 290.1166 | 290.5204 | 437.9509 | 291.6952 | 296.7065 | 295.5432 | 607.4131 | 294.2259 | 324.6019 |
AICc | 290.7320 | 290.8204 | 438.2509 | 292.3106 | 297.0065 | 295.6408 | 607.5107 | 294.5259 | 324.9019 |
The 95
Method | ||||||
---|---|---|---|---|---|---|
95% CI | Length of CI | 95% CI | Length of CI | 95% CI | Length of CI | |
Wald CI | (−0.0760, 0.3007) | 0.2247 | (0.8994, 0.9483) | 0.0489 | (0.4938, 0.4996) | 0.0058 |
Bayesian CI | (0.0004, 0.2413) | 0.2409 | (0.8683, 0.9689) | 0.1006 | (0.0217, 4.8781) | 4.8564 |
HPD interval | (0, 0.2111) | 0.2111 | (0.8667, 0.9667) | 0.1000 | (0.0153, 4.7192) | 4.7038 |
We proposed a new mixture distribution called ZICG and presented its properties, namely the mgf, mean, variance, and Fisher information. According to the empirical study results, the ZICG distribution is suitable for over-dispersed count data containing excess zeros, such as occurred in the number of daily COVID-19-positive cases at the Tokyo 2020 Olympic Games. Confidence intervals for the three parameters of the ZICG distribution were constructed by using the Wald confidence interval, the Bayesian credible interval, and the HPD interval. Since the maximum likelihood estimates of the ZICG model parameters have no closed form, the Newton-Raphson method was applied to estimate the parameters and construct the Wald confidence intervals. Furthermore, Gibbs’ sampling with the RWM algorithm was utilized in the Bayesian computation to approximate the parameters and construct the Bayesian credible intervals and the HPD intervals. Their performances were compared in terms of coverage probabilities and average lengths. According to the simulation results, the index of dispersion plays an important role: when it was small (e.g.,
The first author acknowledges the generous financial support from the Science Achievement Scholarship of Thailand (SAST).
rCGD<−
X =
i = 0
#
U =
#
cdf<−(2 * (1
/(2 + p * ((p − 3) *
{i = i + 1;
cdf = cdf + f(i, p, theta) }
X[j] = i;
}
}
(2 * (1
#
loglikeCG<−
p<−x[1]
theta<−x[2]
loglike<− n *
n *
n *
#
loglike<− −loglike
}
#
loglikeZICG<−
w<−x[1]
p<−x[2]
theta<−x[3]
n0<−
ypos<−y[
loglike<− n0 *
+
(p∧ypos) *
loglike<− −loglike
}
gibbs<−
nonzero_values = y[
m<−n−
ypos = y[
prob.temp =
S.temp =
theta.temp =
w.temp =
p.temp =
p.temp[1]<−0.5 #
theta.temp[1]<−0.5 #
w.temp[1]<−0.5 #
prob.temp[1]<−w.temp[1]/(w.temp[1] + (1−w.temp[1]) * (2 * (1 − p.temp[1])
* 1 − 2 * p.temp[1] *
*
S.temp[1]<−
w.temp[1]<−
p.samp<−GenerateMCMC.p(y = y, N = 1000, n = n, m = m, w = w.temp[1], theta = theta.temp[1],
sigma = 1)
p.temp[1]<−
theta.samp<−GenerateMCMCsample(ypos = ypos,N = 1000, n = n, m = m, w = w.temp[1], p = p.temp[1],
sigma = 1)
theta.temp[1]<−
prob.temp[i]<−w.temp[i − 1]/(w.temp[i − 1] + (1−w.temp[i − 1]) *
theta.temp[i − 1]))
S.temp[i]<−
w.temp[i]<−
, n−S.temp[i] + 0.5)
p.samp<−GenerateMCMC.p(y = y, N = 1000, n = n, m = m, w = w.temp[i],
theta = theta.temp[i − 1], sigma = 1)
p.temp[i]<−
theta.samp<−GenerateMCMCsample(ypos = ypos,N = 1000, n = n, m = m, w = w.temp[i],
p = p.temp[i], sigma = 1)
theta.temp[i]<−
#
GenerateMCMCsample<−
prior <−
theta.samp <−
theta.samp[1] <−
Yj<−theta.samp[j − 1] +
alpha.cri <−((w + (1 − w) *
(2 + p * ((p − 3) *
((w + (1 − w) *
(1 − 2 * p *
(2 + p * ((p − 3) *
}
U <−
if (U <
{theta.samp[j]<− Yj
}
}
##
GenerateMCMC.p <−
prior<−
shape2 = 5)
p.samp <−
p.samp[1] <−
Yj<−p.samp[j − 1] +
alpha.cri <−((w + (1 − w) *
* prior(Yj)/((w + (1 − w) *
((1 − w) *
}
U <−
p.samp[j]<− p.samp[j − 1]}
p.samp[j]<− Yj
}
}
i = 0
suscept<−
y = suscept *
}i = i + 1
p_
p0 = p_
#
intvalues1 =
resultCG<−
mleCG<−resultCG
mleCG.p <−
mleCG.theta<−
#
n0<−
p1 = mleCG.p ; theta1 = mleCG.theta;
w1<−(n0 − n *
intvalues2 =
resultZICG<−
mleZICG<−resultZICG
hess<−resultZICG
stderr<−
mle.w <−
mle.p <−
mle.theta<−
#
#
Wald.w.L[i]<−mle.w −
#
Wald.w.U[i]<−mle.w +
Wald.CI.w =
Wald.CP.w[i] =
Wald.Length.w[i] = Wald.w.U[i] − Wald.w.L[i]
#
#
Wald.p.L[i]<−mle.p −
#
Wald.p.U[i]<−mle.p +
Wald.CI.p =
Wald.CP.p[i] =
Wald.Length.p[i] = Wald.p.U[i] − Wald.p.L[i]
#
#
Wald.theta.L[i]<−mle.theta −
#
Wald.theta.U[i]<−mle.theta +
Wald.CI.theta =
Wald.CP.theta[i] =
Wald.Length.theta[i] = Wald.theta.U[i] − Wald.theta.L[i]
#########End Wald CI#############
test<−gibbs(y = y,
n = n, p = p, theta = theta, w = w)
#
w.mcmc<−test[, 1][1001:3000]
#
w.bayes<−
#
p.mcmc<−test[, 2][1001:3000]
#
p.bayes<−
#
theta.mcmc<−test[, 3][1001:3000]
#
theta.bayes<−
#########
L.w[i] =
U.w[i] =
CIr1 =
Bayes.CP.w[i] =
Bayes.Length.w[i] = U.w[i] − L.w[i]
L.p[i] =
U.p[i] =
CIr2 =
Bayes.CP.p[i] =
Bayes.Length.p[i] = U.p[i] − L.p[i]
L.the[i] =
U.the[i] =
CIr3 =
Bayes.CP.the[i] =
Bayes.Length.the[i] = U.the[i] − L.the[i]
#########
w.hpd = hdi(w.mcmc, 0.95)
L.w.hpd[i] = w.hpd[1]
U.w.hpd[i] = w.hpd[2]
CIr4 =
CP.w.hpd[i] =
Length.w.hpd[i] = U.w.hpd[i] − L.w.hpd[i]
p.hpd = hdi(p.mcmc, 0.95)
L.p.hpd[i] = p.hpd[1]
U.p.hpd[i] = p.hpd[2]
CIr5 =
CP.p.hpd[i] =
Length.p.hpd[i] = U.p.hpd[i] − L.p.hpd[i]
theta.hpd = hdi(theta.mcmc, 0.95)
L.the.hpd[i] = theta.hpd[1]
U.the.hpd[i] = theta.hpd[2]
CIr6 =
CP.the.hpd[i] =
Length.the.hpd[i] = U.the.hpd[i] − L.the.hpd[i]
}