Incomplete json response

mihneagiurgea
@mihneagiurgea Mihnea Giurgea

Hi,

Lately we've been receiving incomplete json responses from multiple Twitter API endpoints and different Python libraries. The cause is always the same: the response we get from the Twitter API (format=json) is sometimes incomplete (an unfinished string which is json invalid).

We've encounted this:
* on different endpoints: statuses/mentions, statuses/user_timeline, search, users/lookup
* using different libraries: urllib2 returns an invalid json response, and httplib raises an IncompleteRead error
* and on different environments: Amazon EC2 servers (running Ubuntu 10), as well as localhost (running something else).

Could this be a Twitter API issue, or something on our end?

1 week 6 days ago

Replies

episod
@episod Taylor Singletary

I'm investigating similar reports to this issue -- we're having trouble reproducing. Is there any way you can capture the HTTP headers you're sending and the HTTP headers you get in response for a request that exhibits this behavior?

1 week 6 days ago
mihneagiurgea
@mihneagiurgea Mihnea Giurgea

Yes, I think we can do that. I'll come back with some headers :)

1 week 6 days ago
mihneagiurgea
@mihneagiurgea Mihnea Giurgea

Here's some header examples:

Request headers:
{'Host': 'search.twitter.com',
'User-agent': 'Python-urllib/2.6',
'Connection': 'close'};

Response headers:
['Cache-Control: max-age=15, must-revalidate, max-age=300\r\n',
'Expires: Thu, 26 Jul 2012 07:07:03 GMT\r\n',
'Content-Type: application/json;charset=utf-8\r\n',
'X-Transaction: d6834a10b953e1a1\r\n',
'X-Frame-Options: SAMEORIGIN\r\n',
'Content-Length: 66658\r\n',
'Vary: Accept-Encoding\r\n',
'Date: Thu, 26 Jul 2012 07:02:03 GMT\r\n',
'X-Varnish: 68396455\r\n',
'Age: 0\r\n',
'Server: tfe\r\n',
'X-Cache: MISS from somehost.host\r\n',
'X-Cache-Lookup: MISS from somehost.host:3128\r\n',
'Via: 1.1 varnish, 1.0 somehost.host:3128 (squid/2.7.STABLE7)\r\n',
'Connection: close\r\n']

1 week 5 days ago
episod
@episod Taylor Singletary

Thank you for your help, I'll submit this to the investigators.

1 week 5 days ago
dpn
@dpn David P. Novakovic

Appears my problem is a dupe of this too. Sorry about reposting.

https://dev.twitter.com/discussions/9636

1 week 2 days ago
episod
@episod Taylor Singletary

Hi David,

Can you share a request and response cycle exhibiting this problem much as @mihneagiurgea has done above? Very useful in getting to the bottom of this. Thanks!

1 week 1 day ago
KiShodan
@KiShodan Kai Koch

I have the same problem, using the Streaming API endpoints 'statuses/filter' and 'statuses/sample'.
I use node.js to connect, read the JSON-chunks and currently simply store the recieved messages in MySQL.

When using the 'status/filter' endpoint I store about 6.64 tweets per second. The error-rate is about
0.045 (error/total).
(Running time: 12 h - Messages: total 286881 / limit 2 / tweet 274041 / error 12838)

When using the 'status/sample' endpoint, I store about 54.29 tweets per second. The error-rate is about 0.037 (error/total).
(Running Time: 1 h - Messages: total 195458 / delete 15593 / tweet 172541 / error 7324)

The choped JSON-Chunks look like this:

  1. file_background_images\/509401083\/tumblr_lxllgazdOO1qbautoo1_500.jpg","utc_offset":null},"id":229874347437862912,"in_reply_to_status_id_str":null,"id_str":"229874347437862912","entities":{"user_mentions":[{"indices":[3,14],"screen_name":"Jen_Budden","name":"Jen Budden","id":31151520,"id_str":"31151520"}],"hashtags":[{"text":"London2012","indices":[73,84]}],"urls":[]},"in_reply_to_screen_name":null,"truncated":false}

and

  1. ,"default_profile":false,"statuses_count":1024,"following":null,"profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/581093985\/x08fcc2f21db30a08b70b2661b57b101.jpg","is_translator":false,"profile_link_color":"FC5884","description":"I am an auditor for the Federal Government.  I am married and have three children, two squirrels, and one Yorkie.  ","show_all_inline_media":true,"profile_background_color":"303253","default_profile_image":false,"profile_background_tile":true,"screen_name":"tjrutledge","follow_request_sent":null,"time_zone":"Pacific Time (US & Canada)","favourites_count":77,"created_at":"Thu Oct 16 03:25:15 +0000 2008","profile_sidebar_fill_color":"A39BDE","protected":false,"followers_count":25,"url":null,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/2426077455\/8XDciw5s_normal","name":"Tara","friends_count":69,"profile_sidebar_border_color":"39D8E5","id":16801549,"lang":"en","profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/581093985\/x08fcc2f21db30a08b70b2661b57b101.jpg","utc_offset":-28800},"id":230059910665228290,"entities":{"urls":[],"hashtags":[],"user_mentions":[{"indices":[0,14],"id_str":"473985169","screen_name":"OfficialLyh","name":"L\u00fdgya! ","id":473985169},{"indices":[15,26],"id_str":"427237476","screen_name":"futura2015","name":"karma woman","id":427237476}]}}

There seems to be no pattern to where the messages are chopped.

I did the tests at home with only 16 mBit/s downstream.
To check if it is a bandwidth problem, I will run the tests on a server with higher bandwidth and report back, when I am done with that.
I will also see if I can supply you with some headers.

1 week 1 day ago
KiShodan
@KiShodan Kai Koch

I ran the 'status/sample' stream for an hour on the server. Looks still like the same problem.

  1. [2012-07-31T05:29:55.227Z] 180991 Chunks inserted.
  2. Chunks per second: 50.275 Total Error (6796) Error Rate (0.038)
  3. Sever shutdown. Uptime: 60 min
  4. Starttime was: 2012-07-31T04:29:55.124Z
1 week 21 hours ago
KiShodan
@KiShodan Kai Koch

I think I found the problem on my side.

As far as I understand the networking stuff, an http-stream might be fragmented.

The parser (https://github.com/ttezel/twit), I use, assumes that each JSON-Object is delivered as whole object and is not fragmented.
It also does not make use of the 'delimited' parameter, when connecting to the stream.

If a JSON-Object is fragmented, the parser does not parse the fragments as combined chunk and also does not save the fragment for later.
There for it fails to recognize the chunk is distributed over 2 packets.

Can anyone using another library confirm this behaviour?

I try to fix that in the code I use and report back.

1 week 20 hours ago
allangrant
@allangrant Allan Grant

We've been hitting the same issue multiple times a day during regular imports of tweets over the last week. It didn't happen prior to that. Regrettably don't have an easy way of capturing the request/response headers, but just wanted to affirm that we're getting these malformed responses as well.

1 week 1 day ago
KiShodan
@KiShodan Kai Koch

My previous posting is still in moderation, since I posted a link to GitHub, I guess.

My problem is solved, there was a fix for the problem in one of the parsers branches.

In short, until the moderators approve my rather long posting:
The parser, I used, did not check for fragmented JSON-Objects, that caused the bogus behaviour.

1 week 16 hours ago
mihneagiurgea
@mihneagiurgea Mihnea Giurgea

Any updates on this? Have you guys managed to confirm that this is indeed a Twitter API issue?

5 days 15 hours ago
dpn
@dpn David P. Novakovic

I still need to get logging of metadata from requests where it fails to twitter. This is not trivial for us as it means deploying new code to a high volume site :)

5 days 15 hours ago
f_wahlgren
@f_wahlgren Fredrik Wahlgren

I get the same problem with malformed atom-responses.

The search line used is:
http://search.twitter.com/search.atom?lang=sv&q=+ahlens&rpp=100

This is when it fails and returns less data than it is supposed to.

  1. tag:search.twitter.com,2005:search/ahlens2012-07-25T10:47:38Z100svtag:search.twitter.com,2005:2280790527564881922012-07-25T10:47:38Z@<a class=" " href="https://twitter.com/puckosmulan">puckosmulan</a> Då kan vi ses vid Åhléns-ingången vid 15 så går vi till det bezta ztellet i ztada.2012-07-25T10:47:38Zrecent<a href="http://www.tweetdeck.com" rel="nofollow">TweetDeck</a>svmaria_s (Maria Sjöberg)http://twitter.com/maria_stag:search.twitter.com,2005:2280778330696826882012-07-25T10:42:47ZRT @<a class=" " href="https://twitter.com/nelisen">nelisen</a>: Dagens inspiration; @<a class=" " href="https://twitter.com/johnvalencia">johnvalencia</a> @ J.Lindeberg Åhléns City <a href="http://t.co/0YNgTUZy">http://t.co/0YNgTUZy</a>2012-07-25T10:42:47Zrecent<a href="http://instagr.am" rel="nofollow">instagram</a>svjohn__valencia (John Valencia)http://twitter.com/john__valenciatag:search.twitter.com,2005:2280753569203732482012-07-25T10:32:57ZHar just kissat för 10 spänn på Åhléns City! Tacka vet jag New York. Där kostar det 0 $.2012-07-25T10:32:57Zrecent<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>svSaJlevik (Åsa Jälevik)http://twitter.com/SaJleviktag:search.twitter.com,2005:2280709468823306252012-07-25T10:15:25ZI parfymdisken på Åhléns får man tydligen inte berätta när MJs nya doft Dot släpps. Vet ni?2012-07-25T10:15:25Zrecent<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>sv

And this is how it should look requested about 10 seconds after the first malformed one.

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <feed xml:lang="en-US" xmlns:georss="http://www.georss.org/georss" xmlns:twitter="http://api.twitter.com/" xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:google="http://base.google.com/ns/1.0"><id>tag:search.twitter.com,2005:search/ahlens</id><link rel="alternate" href="http://search.twitter.com/search?q=ahlens" type="text/html"/><link rel="self" href="http://search.twitter.com/search.atom?q=ahlens" type="application/atom+xml"/><link rel="search" href="http://twitter.com/opensearch.xml" type="application/opensearchdescription+xml"/><link rel="refresh" href="http://search.twitter.com/search.atom?since_id=228079052756488192&q=ahlens&lang=sv" type="application/atom+xml"/><updated>2012-07-25T10:47:38Z</updated><openSearch:itemsPerPage>100</openSearch:itemsPerPage><openSearch:language>sv</openSearch:language><link rel="next" href="http://search.twitter.com/search.atom?page=2&max_id=228079052756488192&q=ahlens&lang=sv&rpp=100" type="application/atom+xml"/><entry><id>tag:search.twitter.com,2005:228079052756488192</id><published>2012-07-25T10:47:38Z</published><link rel="alternate" href="http://twitter.com/maria_s/statuses/228079052756488192" type="text/html"/><content type="html">@<a class=" " href="https://twitter.com/puckosmulan">puckosmulan</a> Då kan vi ses vid Åhléns-ingången vid 15 så går vi till det bezta ztellet i ztada.</content><updated>2012-07-25T10:47:38Z</updated><link rel="image" href="http://a0.twimg.com/profile_images/2187959470/2c83d96e94ef11e1989612313815112c_7__1__normal.jpg" type="image/png"/><twitter:geo/><twitter:metadata><twitter:result_type>recent</twitter:result_type></twitter:metadata><twitter:source><a href="http://www.tweetdeck.com" rel="nofollow">TweetDeck</a></twitter:source><twitter:lang>sv</twitter:lang><author><name>maria_s (Maria Sjöberg)</name><uri>http://twitter.com/maria_s</uri></author></entry>

As far as I'm concerned these errors occur randomly. I have not been able to detect a pattern.

5 days 15 hours ago
episod
@episod Taylor Singletary

Thank you for your details!

5 days 11 hours ago
dpn
@dpn David P. Novakovic

OK, here's some response headers. From a call to friends_timeline.

In python I did "len(json)" and got this number: 119528

This doesn't seem to match the content-length in the response headers though.

{'Status': '200 OK', 'X-Ratelimit-Remaining': '314', 'X-Transaction': '94cd0074f6a44297', 'Content-Encoding': 'gzip', 'Set-Cookie': 'guest_id="v1:134397897415680524";Expires=Sun, 3-Aug-2014 07:29:34 GMT;Path=/;Domain=.twitter.com,lang=en', 'Expires': 'Tue, 31 Mar 1981 05:00:00 GMT', 'X-Access-Level': 'read-write-directmessages', 'Content-Length': '16402', 'Server': 'tfe', 'Last-Modified': 'Fri, 03 Aug 2012 07:29:34 GMT', 'X-Ratelimit-Limit': '350', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate, pre-check=0, post-check=0', 'Date': 'Fri, 03 Aug 2012 07:29:34 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Type': 'application/json;charset=utf-8', 'X-Ratelimit-Class': 'api_identified', 'X-Ratelimit-Reset': '1343981440'}

4 days 19 hours ago
dpn
@dpn David P. Novakovic

Here's another one with full url and headers.

  1.  headers: {'Status': '200 OK', 'X-Ratelimit-Remaining': '326', 'X-Transaction': 'b58b122a66a27ece', 'Content-Encoding': 'gzip', 'Set-Cookie': 'guest_id="v1:134398102359915433";Expires=Sun, 3-Aug-2014 08:03:43 GMT;Path=/;Domain=.twitter.com,lang=id', 'Expires': 'Tue, 31 Mar 1981 05:00:00 GMT', 'X-Access-Level': 'read-write-directmessages', 'Content-Length': '32786', 'Server': 'tfe', 'Last-Modified': 'Fri, 03 Aug 2012 08:03:43 GMT', 'X-Ratelimit-Limit': '350', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate, pre-check=0, post-check=0', 'Date': 'Fri, 03 Aug 2012 08:03:43 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Type': 'application/json;charset=utf-8', 'X-Ratelimit-Class': 'api_identified', 'X-Ratelimit-Reset': '1343982382'}
  1.  actual_content_length: 390374
  1.  request_url: http://api.twitter.com/1/statuses/home_timeline.json?count=200&oauth_nonce=47392d3d75154b6695d0a475b7d5b829&oauth_timestamp=1343981008&oauth_consumer_key=OURKEY&oauth_signature_method=HMAC-SHA1&oauth_version=1.0&oauth_token=OAUTHTOKEN&oauth_signature=H6f8FFzuTd6BpLDKgRUR0KnUu1g%3D&page=1
  1.  request_headers: {'Expect': '', 'Pragma': ''}
4 days 19 hours ago
Mezgrman
@Mezgrman Mr. J. Mezgr

I'm also having troubles with incomplete JSON responses recently. I didn't modify my code, but it keeps crashing due to that error. So I assume that this isn't an issue on my side…

4 days 15 hours ago