At least one support vector exists.

The same discussion as in Chapter 8 (Support Vector Machine) of “Mathematics of Statistical Machine Learning” is also found in “The Elements of Statistical Learning” and Bishop’s book “Pattern Recognition and Machine Learning”. However, the reason why there exists at least one $i$ such that $y_i(\beta_0 + x_i\beta) = 1$ is not explained. I mentioned it in my book, but upon careful consideration, it turns out I was mistaken. Intuitively, the correct explanation is that “if there are no support vectors, the margin can be made larger”, but I had been thinking if there was a shorter mathematical expression for it.
First, if $y_1, \ldots, y_N$ are all equal and the optimal solution for $\beta$ is 0, then from $y_i(\beta_0 + x_i\beta) = 1$, we have $\beta_0 = y_i$. In other cases, we derive a contradiction by assuming that the union of sets $A = \{i | y_i(\beta_0 + x_i\beta) > 1 \}$ and $B = \{i | y_i(\beta_0 + x_i\beta) < 1 \}$ is equal to $\{1, \ldots, n\}$. (8.18) and (8.19) respectively are \[ \beta = C\sum_{i\in B}y_ix_i^\top \] \[ \sum_{i\in B}y_i = 0 \] Taking the sum for $i = 1, \ldots, N$ in (8.16), since $\alpha_i = 0$ for $i \in A$, we have
\[ C\sum_{i\in B}\{ y_i(\beta_0+x_i\beta)-(1-\epsilon_i)\} = 0 \] and substituting the above two equations, we get
\[ \beta^\top \beta = C\sum_{i\in B}(1-\epsilon_i) \] Substituting these into (8.3), we get
\[ L_P = C\{-\frac{1}{2}\sum_{i\in B}(1-\epsilon_i)+\sum_{i\in A}\epsilon_i+N\} \] For $i \in B$, from (8.14), $\epsilon_i > 0$. This means that, without changing the elements of sets $A$ and $B$, if any $\epsilon_i$ for $i \in B$ is decreased, the value of $L_P$ can be further decreased, contradicting optimality. Also, if $B$ is the empty set and only $A$ exists, then $\beta = 0$ implies that $y_i\beta_0 > 1$ holds for all $i \in A$, resulting in $y_1 = \cdots = y_N$ (which was excluded in this case). Therefore, $A \cup B \neq \{1, \ldots, N\}$.

Therefore, in either case, there exists at least one $i$ such that $y_i(\beta_0+x_i\beta) = 1$.