Tuesday, June 22, 2010

The path to manipulate data....

One my way to work with twitter data and their annotation, it seems I will have to know quite a bit of atleast some technologies that are used to serialize data, the earlier man used JSON, so I have to peek into JSON stuff in order to get a hang of it.

JSON , is Java Script Object Notation, much like XML its used to seralize objects, in my case he has used it to parse the csv seperated data, in order to generate objects that can be rather decently accessed.

One good advantage of JSON over XML is that JSON is very light weight, in that you essentially avoid the tagging process that goes on with XML, so var v={ "name":"Nikhil","game":["khokho","dharma-guru"] } and var obj=eval(v), instantly creates an object that you can access with member-of "." operator, obj.name would then return my name.

A more detailed description of this is found here:
http://msdn.microsoft.com/en-us/library/bb299886.aspx

Sunday, June 6, 2010

Working on Named entity recognizers.

Today onwards I will be working on a named entity recognizer. Its supposed to be stanford ner s/w project trained with twitter data. The twitter data was annotated using mturk and other crowd sourcing entities.Now this annotated data is going to go into the NER system and finally this will give us a NER that recognizes the system names.
This is the goal of the project.