The Fifth Elephant » Pune Data Hacknight
I did not had any prior working background about the big data but I always found it interesting. In my current profile at STEC, I anyalize the performance numbers, so I thought hacknight would be the right place to place to get started.
Once everyone was in we had a round of introduction and discussed the ideas posted on the website. We then listed down the ideas for which we had the datasets. I found following ideas interesting:
- Travel Recommender
- IEEE ICDM Kaggle Competition
- Automatic Twitter Movie Reviews Analyzer
- Movie Star Social Media Popularity Meter
I decided to work on the “IEEE ICDM Kaggle Competition” with Jaidev. It took some time to figure out what the exact problem is and what the training sample is.
There was a listing of the products in a json file and reviews of the products in another json file. We have to the match the reviews with products. Rather than jumping into writing any code for reading the file I decided to look at the data more care fully. After spending some time on that I saw there we have to write some machine learning tool. I do not have much idea in that area. Jaidev gave me reference material on that.
I was not how I should proceed further so I decided to not to spend more time on the dataset. I started looking for more basic definition like what is bigdata, what does data science means, why it becoming more important etc. . I found it very fascinated that what all we can do with the data.
Sometime in the night Nikhil shared some info about group databases and neo4j, which I found is very useful. Shreyank told us what we can do with the D3. He pulled out his friends list from the Facebook and their connectivity with each other. Later he showed very nice graphs using the data which he collected. I was amazed to see them.
hasgeek has done the great job in organising the event. The venue was centrally located, spacious and had good internet connection. We had plenty of snacks, soft and energy drinks.
Everyone was free to work on any datasets, which is good. For beginners like me if we had some simpler datasets with well defined problem statements then it would have much better.
Overall I liked the event very much and would like to come for more such events. Meanwhile I would try to work with simpler datasets.