Timestamp consistency

michaelwilde
@michaelwilde Michael Wilde

For the streaming API, while yes, i know its JSON, and one can parse JSON to determine the "created_at" attribute at the highest level... but in my case, i'm just parsing the raw text out. It would be nice to have an actual field that was unique to THIS POST. If i'm writing a regex to find "created_at" its very very difficult to guarantee "which created_at" is the exact time a post was.. well... created at. As with retweets, Twitter includes the original tweet and the retweet. There are geo tweets and what it appears to be about 5 or 6 different types of messages. JSON parsing at a massive volume is much slower than i'd like where straight regex on unique field name would be freakin sweet.

Also.. could y'all please put your timestamps in milliseconds. I know twitter processes more than one message per second, and for ordering purposes, milliseconds resolution would provide better accuracy when ordering messages .

7 weeks 6 days ago

Replies

michaelwilde
@michaelwilde Michael Wilde

...and another thing. Twitter guys, would you be open to changing the order of the "created_at" field so in the text string that's the JSON message it would appear as the first or second string?

3 weeks 2 days ago
episod
@episod Taylor Singletary

The Streaming API really isn't optimized for usage via regular expressions and you'll continue to find this difficult. Especially since the order of fields is in no way guaranteed. the data is hierarchal, so you'll find created_at fields at different levels of the hierarchy (a tweet within a tweet, a user within a tweet, a user within a tweet within a tweet) -- you may find it just much easier to use a JSON parser so that you have a representative hash to work with instead.

3 weeks 2 days ago
michaelwilde
@michaelwilde Michael Wilde

Does Twitter not control the structure of a tweet when it is created? If so might you be willing put things like timestamp at the beginning. I actually don't need the JSON structure to be parsed at the time i'm consuming the data. I parse it when i'm retrieving it. Much faster that way as i'm using an inverted time-series indexing technology known as "Splunk"

3 weeks 2 days ago