AI/Machine Learning (Computer Vision/NLP) Orientation Interview Review 1

1. Judging full binary trees

All nodes have degree either 0 or 2 and all leaf nodes are in the last layer.

#include <iostream>
using namespace std;
class TreeNode {
public:
	int val;
	TreeNode* left;
	TreeNode* right;
//Input parameters when creatingx，would putxdo sth (for sb)val，nullptrdo sth (for sb)leftcap (a poem)right
	TreeNode(int x) : val(x), left(nullptr), right(nullptr) {};
	TreeNode(int x, TreeNode* l, TreeNode* r) : val(x), left(l), right(r) {};
};
bool isfull(TreeNode* cur) {
	if (cur == nullptr) return true;
	if (cur->left == nullptr && cur->right != nullptr || cur->left != nullptr && cur->right == nullptr) return false;
	if (cur->left != nullptr) {
		isfull(cur->left);
		isfull(cur->right);
	}
	else {
		return true;
	}

}

int main() {
	int x = 1;
	TreeNode* left = new TreeNode(1);
	TreeNode* right = new TreeNode(1);
	TreeNode* root = new TreeNode(1, left, nullptr);
	cout << isfull(root) << endl;

}

The point is to create a TreeNode class and write the constructor and call the constructor to create the node.

2. Given a number, find the square root of the number without using the built-in function.

Dichotomous solution. Recursion.

float n;
float e = 0.001;
float findsquare(float left, float right) {
	float mid = (left + right) / 2;
	if (mid * mid - n >= 0 && mid * mid - n < e || mid * mid - n <= 0 && mid * mid - n >= -e) {
		return mid;
	}
	else {
		if (mid * mid > n) {
			findsquare(left, mid);
		}
		else {
			findsquare(mid, right);
		}
	}
}
int main() {
	cin >> n;
	cout<< findsquare(0, n)<<endl;

}

3. GAN model content

Image Generation Models. Two of the more understood image generation models are GAN and diffusion.

The basic process of GAN: The generator can use any network that outputs a 2D image, such as a DNN or CNN.

The Discriminator typically has an input of images and an output of real or fake.

In each round, the reference input is recognized as real in the discriminator, and the input from the Generator output is recognized as false in the discriminator.

Both Generator's loss function and Discriminator's loss function are binary cross-entropy, which means that the probability of evaluating real data is evaluated, and Generator's goal is toMaximizing Binary Cross Entropy, which is to make all false results positive, while Discriminator is to minimize the binary cross-entropy and make all false results false.

4. Diffusion model contents

It starts with math knowledge:

conditional probability formula

Based on Markov's assumption that the current probability is only related to the previous moment's probability and not to other moments. It is possible to approximate away the other terms of the conditional probability.

KL dispersion formula for the Gaussian distribution:

Parameter reorganization: organize z as a network input and the other two as network parameters that can be used to find the gradient of.

Multivariate VAE objective functions, all reason about z based on x and use z to predict x. There are multiple z's for multivariate VAE.

Diffusion Model is mainly two processes, first diffusion from the target distribution to get the noise distribution, is the process of entropy increase;

The target distribution is then predicted from the noise distribution. The training process is to train this x so that the desired target distribution can be obtained from randomly generated (e.g. Gaussian distribution ) noise.

The diffusion process is p and the inverse diffusion process is q. The drift is the difference between the two.

5. Binary tree creation, insertion and deletion

Here it should be searching a binary tree with a left node smaller than itself and a right node larger than itself.

Deletion of the first will not write

#include<iostream>
using namespace std;

class TreeNode {
public:
	int val;
	TreeNode* left;
	TreeNode* right;
	TreeNode(int x) :val(x), left(nullptr), right(nullptr) {};
	TreeNode(int x, TreeNode* l, TreeNode* r) : val(x), left(l), right(r) {};
};
TreeNode* insert(TreeNode* cur, int x) {
	if (cur == nullptr) {
		return new TreeNode(x);
	}
	if (x < cur->val) {
		cur->left = insert(cur->left, x);
	}
	else if (x > cur->val) {
		cur->right = insert(cur->right, x);
	}
	return cur;

}
//It's a little complicated.，Let's leave it at that.
TreeNode* deleteNode(TreeNode* cur, int val) {
	if (cur == nullptr) {
		return cur;
	}

}
int main() {
	int x = 1;
	TreeNode* root = new TreeNode(x);
	insert(root, 2);
}

6. Linux-related commands:

top View process information
df -h View hard disk usage
ps aux view all processes
kill -9 pid to kill the process numbered pid
chmod Modify permissions
grep Find data containing a string from a file name.
wc -l count lines
cut Split a line
echo $PATH | cut -d ':' -f 3,5: output PATH with: split the third and fifth columns of data
find -name Find file
vim browse
head -3 show first three lines
docker.
docker ps -a view container
docker attach restore container
docker exec hangs the container
docker run Run the container
vim :n go to line n dd delete current line :q! quit :wq save quit gg=G formatting
ssh login server scp -r pass file

7. Rapid sorting

#include <bits/stdc++.h>
using namespace std;
const int N = 100001;
void quicksort(int * arr, int l, int r){
    if(l >= r) return;
    int i = l-1, j = r+1;
    int mid = (l+r) / 2;
    int x = arr[mid];
    while(i<j){
        do i++; while(arr[i] < x) ;
       do j--; while(arr[j] > x) ;
       if(i<j) swap(arr[i],arr[j]);
    }
    quicksort(arr, l,j);
    quicksort(arr, j+1, r);

}
int main(){
    int n;
    cin>>n;
    int arr[N];
    for(int i =0;i<n;i++){
        cin>>arr[i];
    }
    quicksort(arr,0,n-1);
    for(int i =0;i<n;i++){
        cout<<arr[i]<<" ";
    }cout<<endl;
}

Always forgetting points:First do and then while, i and j are initialized to l-1 and r+1, because they will increment or decrement themselves after entering the dowhile loop.

quicksort(arr, l,j); Here you can't use i instead of j because i is always greater than x and j is less than or equal to x. It is important to make sure that the segment on the left is less than or equal to x and the segment on the right is greater than or equal to x.

8. xgboost and deepfm performance.

(1) Structure of xgboost:A model consisting of multiple regression decision trees. A new tree is added at each step. (Forward distribution algorithm with greedy strategy) Stepwise optimization of the base learner.

The parameters of the previous t-1 trees are determined when optimizing the tth tree. The objective function for each round is the minimum loss of n samples + the regular term

The regular term is the complexity of the first t trees. It is determined by the number of leaf nodes and the sum of the squares of each node's value w. The regular term is designed to prevent overfitting. The more leaf nodes there are, the easier it is to overfit. A large node value causes the tree to take up more space and is also prone to overfitting.

In machine learning, parameters are generally optimized by gradient descent. But the tree model is stepwise, and discontinuous functions can't find the gradient. So xgboost is to find the loss for each leaf node. The loss of each leaf node can be calculated by using the gradient, and the first order derivative and second order derivative, which is the Hessian matrix, are used to find the optimal split point.

(2) How can xgboost be used for recommender systems?

The user's contextual information is fed into xgboost as features to predict the probability of the user clicking. So xgboost is doing regression and after putting it into it

Since it is a regression task, each iteration is to select the split point of the leaf node and then get a value based on the split point which is the click probability. Multiple numbers are weighted for averaging. The number of nodes in the tree these are hyperparameters.

(3) How is xgboost parallelized?

In parallel, the efficiency is accelerated with parallel operation at the optimal splitting point. It chunks the features, calculates the gain of each feature in parallel, and finds the optimal splitting point by the gain. Then synchronize the results and select the largest feature for splitting.

What is the improvement of xgboost over Gradient Boosted Trees (GBDT)?

Second-order derivatives (Hessian) are introduced, which are more accurate in the optimization process than traditional GBDT (which uses only first-order derivative information).

(4) Deepfm algorithm:

Deep factorization machines Factorization machines. It does feature interaction for lower order features and another DNN neural network that does higher order feature interaction. The general output is a weighted sum of the two.

What is a factorization machine (FM)?

FM is an extension of SVM and is more suitable for dealing with sparse features. The main consideration is the cross-relationship between multidimensional features (like the kernel function of SVM, with inner product, but with factorization parameterization, whereas the dense parameterization is used in SVM, which makes FM much less parameterized and easier to compute compared to SVM). Where the parameters are trained using the matrix decomposition approach.

For example for data in movie ratings, modeled with onehot vectors, one feature is very sparse and very long. A factorization machine is a modified second-order polynomial model that takes into account the similarity between two vectors, e.g. liking this type of movie for another type of movie. (Principles of FM (Factorization Machine) model for recommender system and code practice - JenBook)

Essentially it is using deepfm to sort the set of candidates in the recall phase. So what is done is just sorting, not recall. loss uses adam.

(5) Why use deepfm on large-scale datasets?

When dealing with user behavior data and implicit feedback data, DeepFM is able to capture complex nonlinear relationships through its deep part and performs better. DeepFM has advantages in large-scale recommender systems, such as advertisement recommendation and product recommendation.

It is suitable for large-scale data and automatic feature learning scenarios, and is especially good at handling high-dimensional sparse features. However, it requires a large amount of data and computational resources to fully utilize its advantages.

9. Determining whether there is a ring in a chain table

You can use hash method or fast and slow pointer method. Fast and slow pointers should be careful: judge the next of fast. otherwise it will be out of bounds, and initialize two pointers can not be the same, otherwise when there is only one data return is not right.

    bool hasCycle(ListNode *head) {
        if (head == nullptr) return false;
        ListNode* slow = head;
        ListNode* fast = head->next;
        while(slow != fast){
            if(fast == nullptr || fast->next == nullptr) return false;
            slow = slow->next;
            fast = fast->next->next;
        }
        return true;
    }

Hash table method: note that insert is INSERT

bool hasCycle(ListNode *head) {
        unordered_set<ListNode*> sets;
        ListNode * cur = head;
        while(cur!=nullptr){
            if((cur)) return true;
            (cur);
            cur = cur->next;
        }
        return false;
    }

10. HDFS-related basics

Don't know much about hadoop, mainly used some hadoop commands for data reading.

HDFS is hadoop distribution file system.HDFS files are distributed across a cluster of servers to provide replica and fault tolerance guarantees.

Suitable for storing particularly large files, using streaming data for access. But not suitable for millisecond level access, is a bit delayed.

I was using some command line commands, for example:

hadoop fs -copyFromLocal // copy file
hadoop fs mkdir
hadoop fs -ls