This is a baseline using logistic regression.
We approach these binary classification tasks (clarity and conciseness) as probability estimation.
The features we are using include:
- Length of the title (integer number)
- The title contains number or not (0 or 1)
We are using the same features for both clarity and conciseness models. These features are quite basic and naive, so feel free to add your fancy things to improve the models.
The code is available on Github:
https://github.com/tqtg/CIKMCup2017_Lazada_BaselineLogisticRegression