2014年9月29日星期一

Difficulty In Sentiment Analysis & Opinion Mining

When I want to pick a film, I always go to DOUBAN for reference. Firstly, I will focus on which style of film would attract me better through finding the key words. Then I will compare the scores of films. Finally I will read some comments of films having a high score.



Recently more and more individuals prefer shopping online. I think many people also visit similar platform to gain some information before making their choice, just like my picking films. Our opinions and behaviors are influenced by others'. Therefore, sentiment analysis is increasingly valuable to study.

Through the fourth course, we have learnt relative knowledge in the field of mining opinion. I want to take DOUBAN mentioned above as a example. How to deal with the enormous comments? In an intuitive sense, adjectives represent the sentiment. In addition, we regard unigrams as the feathers, because they are more simple and give satisfactory results. And we have to dispose the negative sentences. We add NOT_ to every word between negation and following punctuation. "don't like this movie" is changed to "don't NOT_like NOT_this NOT_movie". Then we can use an appropriate classifier to finish the classification:



However, sentiment analysis is not that easy. When browsing the web pages, I found so many euphemism or sarcastic comment. "If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut." The writer runs the gamut of emotion from A to B. "This film should be brilliant.  It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up." I expecte a communication with classmates, who have some ideas of solving this problem.

2014年9月20日星期六

My Thoughts on Social Media Analytics

After three courses taken, I have learnt a lot about social media analytics. It is my pleasure to share my opinions with you.

Along with the advance of the society, we are now living in a world of numerous messages. Everybody uses social platform to get and share informations related with him. More and more netizens have been used to social software, such as Facebook, Google+, Wechat and Qzone. We can acquire great data from these social circles, which will make a amazing contribution to learning more about users' thoughts and improving our service.


However,how to deal with such an enormous data resource? As we know, text content is easier to analyse. Here natural language processing is required. It is a field that involves computer science, artificial intelligence, linguistics, human-computer interface and so on. Through taking NLP, we can learn customs' ideas, sentiment and behavior. To complete what mentioned above, we also have to dispose the useless words and punctuation.


In order to compare the documents, we are to classify the text, using vector to estimate terms and method of probability and mathematical statistics. In addition, clustering can finish it without labelled data. During learning K-Means clustering algorithm, I thought of the knowledge of pattern recognition. They apply the similar method to divide the items into any clusters. And they are both unsupervised, in relation to artificial intelligence. The second figure shows us how the pattern recognition works. Though the objects they dispose are different,they both reflect the application of function and the beauty of mathematics.

Social Media Analytics is a advanced and practical course. I do believe I will enjoy the course and reap no little benefit.