Lets understand how we went from step 2 to step 3. \(x^*\) is a solution for the primal but for \(g\) it is just some feasible solution. There is no guarantee that it is an optimal. Therefore it must be greater than equal to the actual optimal solution of \(f(x)+\sum\limits_{i=1}^mu^{*}_ih_i(x)+\sum\limits_{j=1}^rv^{*}_jl_j(x)\).
Step 4:\(\leq f(x^*)\)
Lets understand how we went from step 3 to step 4. Its is two parted.
Lets first focus on \(\sum\limits_{j=1}^rv^{*}_jl_j(x)\).
We already know that from eqn (3) that \(v^{*}_jl_j(x^*)\) must be 0 if \(x^*\) is the optimal if the constraints are respected. So summation of all individual \(j\)-s will also be 0. Therefore the term entirely will be zero.
Now lets focus on \(\sum\limits_{i=1}^mu^{*}_ih_i(x)\)
From eqn (4) we know that \(h_i(x)\) must be -ve. Then from eqn (5) we know that \(u^{*}_i\) must be +ve. As a result \(u^{*}_ih_i(x)\) is +ve multiplied by -ve, should be -ve too. When we sum up all $i$ which are individually negative values, the resulting summation would be -ve too.
We have already shown that this term is 0. So we dont have to worry about it.
\(\sum\limits_{i=1}^mu^{*}_ih_i(x^*)\):
Now we have to show that this term is 0 in order for the inequality to become an equality. But we saw earlier that this term is actually -ve and not 0. So now comes the KKT condition 2 - The complimentary slackness condition. This enforces that \(u_{i}.h_{i}(x)=0\) . Since each term is 0, the summation entirely is 0. This makes the inequality into an equality. Without the KKT condition 2 this would have not been possible.
\(\leq\) in step 3
Now lets handle the only inequality left. This was slightly difficult for me to come to terms with.
We already saw that \(x^*\) is actually not an optimal solution for the LHS (that is \(g(x)\)) but is just some feasible solution. So for the inequality to become an equality we have to somehow enforce that $x^*$ is also an optimal solution for \(g(x)\).
Lets focus only on the LHS : \(\min_{x}\;f(x)+\sum\limits_{i=1}^mu^{*}_ih_i(x)+\sum\limits_{j=1}^rv^{*}_jl_j(x)\)
Note that from eqn (1), \(L(x,u^{*},v^{*})=f(x)+\sum\limits_{i=1}^mu^{*}_ih_i(x)+\sum\limits_{j=1}^rv^{*}_jl_j(x)\) . So the LHS can also be written as \(\min_x\;L(x,u^{*},v^{*})\).
Say the optimal solution for this after minimising wrt $x$ is \(\tilde x\) . Note that as of now \(\tilde x \neq x^*\) . That is what we have to enforce but we will get there soon. For now remember that \(\tilde x\) is different and is the optimal solution for the above.
If \(\tilde x\) is the optimal solution then if we get the sub-gradient of $L$ at \(\tilde x\), that is \(\partial L(\tilde x,u^{*},v^{*}) = 0\). (Since we are using a convex differentiable function, we can consider that sub-gradient is the same as gradient.). In other words \(0 \in \partial L(\tilde x,u^{*},v^{*})\).
Now to back to enforcing that \(\tilde x = x^*\). Only if \(\tilde x = x^*\), the inequality becomes an equality. This is where KKT condition 1 - Stationarity condition comes into play. It enforces this. If \(\tilde x = x^*\) that means like we saw earlier, \(0 \in \partial L(\tilde x = x^{*},u^{*},v^{*})\) which is the same as \(0\in\partial\left(f(x^{*})+\sum\limits_{i=1}^mu^{*}_ih_i(x^{*})+\sum\limits_{j=1}^rv^{*}_jl_j(x^{*})\right)\).
We have now shown that both the KKT condition 1 and 2 are required to prove necessity. The conditions 3 and 4 are by default required to enforce feasibility of the primal and the dual problem hence the condition names.
Hence necessity is proved.
Sufficieny
\[\text{Optimality} \impliedby \text{KKT}\]
Now lets try to prove the other direction. Compared to the necessity, the sufficiency here is much easier to prove and understand. So here we assume KKT conditions are true, now we need to prove optimality.
Say \(x^{*}, u^{*}, v^{*}\) satisfy the KKT conditions then we prove have to prove that they are the optimal values for primal and dual. That is we have to show \(g(u^{*}, v^{*})=f(x^{*})\).
This is true because the KKT condition 1 - the stationarity condition. Since from it we already know that \(0\in\partial\left(f(x^{*})+\sum\limits_{i=1}^mu^{*}_ih_i(x^{*})+\sum\limits_{j=1}^rv^{*}_jl_j(x^{*})\right)\), \(x^*\) is the optimal solution for the \(\min_x\).
This is true because of the KKT condition 2 - the complimentary slackness condition. Since from it we already know that \(u^{*}_ih_i(x^{*})=0\), the summation of individual terms will also be zero.
By KKT condition 3 - primal feasibility condition, we already know that \(l_j(x^{*})=0\), so the 3rd term becomes 0 too.
Hence sufficiency is proved.
Conclusion
We have now effectively proved both directions by proving necessity and sufficiency. To sum it up,
For an constrained optimization problem, if \(x^{*}, u^{*}, v^{*}\) satisfy the KKT conditions, then we can say that satisfying those KKT conditions is sufficient enough to imply that \(x^{*}, u^{*}, v^{*}\) are the optimal primal and dual solutions for that problem.
Also, if strong duality holds and there exist optimal solutions \(x^{*}, u^{*}, v^{*}\) for that problem, then they must necessarily satisfy the KKT conditions.
Acknowledgements
I heavily borrowed from Ryan Tibshrani’s video lecture and notes for understanding KKT conditions. There are other lectures of Ryan explaining KKT conditions but I personally felt this one was much more easier to understand.
Thank you Piyushi for helping me understand this topic and proof-reading this blog!