For this competition, you are expected to work with the dataset by Lazada. Use of external data (beyond that provided by the competition) is permitted, provided the data is freely available.
If you are using a source of external data, you must post the source to the official external data forum thread no later than two weeks prior to the deadline of Phase 1. Once a source is posted here, you do not need to repost it.
This requirement is to ensure:
1. You have obtained the data legally.
2. The organizers and the community have the opportunity to examine the validity and the appropriateness of the data.
3. All the participants are on the same footing, and no one is advantaged because of privileged access to special dataset.
4. Sharing data with each other might result in better insights and more interesting models.
The organizers reserve the right to rule out specific datasets if they are found to be inappropriate.
I would probably end up using Glove word embeddings, and then some Python packages that provide trained models such as Spacy, NLTK and antispam.Posted by: mnicosia @ July 8, 2017, 11:09 a.m.
I will also use Glove word embeddings and maybe some corpora from NLTK.Posted by: victor191 @ July 9, 2017, 9 a.m.
I used pre-trained word2vec, char2vec and NLTK resources.Posted by: thanhvu @ July 9, 2017, 1:16 p.m.
I will use the text publicly available text embedding. I will also use lazada websites links for any additional data.Posted by: GD @ July 10, 2017, 4:20 a.m.
I have used Globe emb, Spacy and NLTK.Posted by: mcp @ July 10, 2017, 6:30 a.m.
I used sentiword netPosted by: Murkrow @ July 10, 2017, 10:20 a.m.
I might use some color dictionary that generated by myself.Posted by: sherryxue1991 @ July 10, 2017, 11:02 a.m.
- Data: Wikipedia english articles data dump
- Pre-trained models: GloVe (Stanford NLP Group), Word2Vec (Google), openNLP
can I use some web page in lazada website? thank you.Posted by: sherryxue1991 @ July 10, 2017, 3:27 p.m.
I am using word embeddings (available publicly), SpaCY and NLTKPosted by: samarthagarwal23 @ July 10, 2017, 4:01 p.m.
I also use nltk stopword corpusPosted by: Murkrow @ July 10, 2017, 4:03 p.m.
We used Glove embedding, spacy, nltk, and list of product brands from lazada.Posted by: Saigonapps @ July 10, 2017, 10:09 p.m.
@sherryxue1991 sure you can use Lazada website data.
```can I use some web page in lazada website? thank you.```
Hi, my team may use Globe emb, Spacy and NLTK.Posted by: TangYifan @ July 18, 2017, 7:58 a.m.
I would use tools like NLTK ,spacy ,beautifulsoup ,sklearn and keras for this competition.Posted by: naiven @ July 18, 2017, 10:39 a.m.
My team is using Glove word embeddings and Meaning Cloud text analysis.
YifanTang's post (cont'd)
We also use GoogleNews-vectors-negative300.binPosted by: fangyizhang @ July 21, 2017, 8:46 a.m.