Category Archives: Uncategorized

At least one support vector exists.

The same discussion as in Chapter 8 (Support Vector Machine) of “Mathematics of Statistical Machine Learning” is also found in “The Elements of Statistical Learning” and Bishop’s book “Pattern Recognition and Machine Learning”. However, the reason why there exists at least one $i$ such that $y_i(\beta_0 + x_i\beta) = 1$ is not explained. I mentioned it in my book, but upon careful consideration, it turns out I was mistaken. Intuitively, the correct explanation is that “if there are no support vectors, the margin can be made larger”, but I had been thinking if there was a shorter mathematical expression for it.
First, if $y_1, \ldots, y_N$ are all equal and the optimal solution for $\beta$ is 0, then from $y_i(\beta_0 + x_i\beta) = 1$, we have $\beta_0 = y_i$. In other cases, we derive a contradiction by assuming that the union of sets $A = \{i | y_i(\beta_0 + x_i\beta) > 1 \}$ and $B = \{i | y_i(\beta_0 + x_i\beta) < 1 \}$ is equal to $\{1, \ldots, n\}$. (8.18) and (8.19) respectively are \[ \beta = C\sum_{i\in B}y_ix_i^\top \] \[ \sum_{i\in B}y_i = 0 \] Taking the sum for $i = 1, \ldots, N$ in (8.16), since $\alpha_i = 0$ for $i \in A$, we have
\[ C\sum_{i\in B}\{ y_i(\beta_0+x_i\beta)-(1-\epsilon_i)\} = 0 \] and substituting the above two equations, we get
\[ \beta^\top \beta = C\sum_{i\in B}(1-\epsilon_i) \] Substituting these into (8.3), we get
\[ L_P = C\{-\frac{1}{2}\sum_{i\in B}(1-\epsilon_i)+\sum_{i\in A}\epsilon_i+N\} \] For $i \in B$, from (8.14), $\epsilon_i > 0$. This means that, without changing the elements of sets $A$ and $B$, if any $\epsilon_i$ for $i \in B$ is decreased, the value of $L_P$ can be further decreased, contradicting optimality. Also, if $B$ is the empty set and only $A$ exists, then $\beta = 0$ implies that $y_i\beta_0 > 1$ holds for all $i \in A$, resulting in $y_1 = \cdots = y_N$ (which was excluded in this case). Therefore, $A \cup B \neq \{1, \ldots, N\}$.

Therefore, in either case, there exists at least one $i$ such that $y_i(\beta_0+x_i\beta) = 1$.

Table of Contents: “Kernel Methods for Machine Learning with Math & R/Python” in the book series (Dec. 2021)

The new books are available Dec. 2021 – Jan. 2022 and consist of many source programs as well as math propositions.

Chapter 1 Positive Definite Kernel
1.1 Positive-definiteness of matrix
1.2 kernel
1.3 Positive Definite Kernel
1.4 Probability
1.5 Boxner's theorem
1.6 string, tree, kernel kernel

Chapter 2 Hilbert space
2.1 Metric space and completeness
2.2 Linear space and inner product space
2.3 Hilbert space
2.4 Projection theorem
2.5 Linear operator
2.6 Compact operator

Chapter 3 Reproducing Kernel Hilbert Space
3.1 RKHS
3.2 Sobolev space
3.3 Mercer's theorem

Chapter 4 Kernel Calculation
4.1 Kernel ridge regression
4.2 Kernel analysis
4.3 Kernel SVM
4.4 Splines
4.5 Random Fourier function
4.6 Nistrom approximation
4.7 Cholesky decomposition

Chapter 5 MMD and HSIC
5.1 RKHS random variables
5.2 MMD and the two sample problem
5.3 Testing Independence via  HSIC
5.4 Kernel Kernel and Universal Kernel
5.5 Introduction to Experience Process

Chapter 6 Gauss Individual and Functional data Analysis
6.1 Regression
6.2 Classification
6.3 Auxiliary variable method
6.4 Karhunen-Loeve expansion
6.5 Functional data analysis