1. Count Vectors + RidgeClassifier
1 | # Count Vectors + RidgeClassifier |
output: 0.7410794074418383
2. TF-IDF + RidgeClassifier
1 | from sklearn.feature_extraction.text import TfidfVectorizer |
output: 0.8721598830546126
Try a bigger max_features:
1 | tfid_try = TfidfVectorizer(ngram_range=(1, 3), max_features=5000) |
output: 0.8850817067811825
3. LogisticRegression
1 | from sklearn import linear_model |
output: 0.8464704900433653
4. SGDClassifier
1 | tfidf = TfidfVectorizer(ngram_range=(1,3), max_features=5000) |
output: 0.8461511856339045
5. SVM
1 | from sklearn import svm |
output: 0.883129115819089
6. Summary
method | f1_score |
---|---|
Count Vectors + RidgeClassifier | 0.7410794074418383 |
TF-IDF + RidgeClassifier | 0.8850817067811825 |
TF-IDF + LogisticRegression | 0.8464704900433653 |
TF-IDF + SGDClassifier | 0.8461511856339045 |
TF-IDF + SVM | 0.883129115819089 |