Hi,
Lately we've been receiving incomplete json responses from multiple Twitter API endpoints and different Python libraries. The cause is always the same: the response we get from the Twitter API (format=json) is sometimes incomplete (an unfinished string which is json invalid).
We've encounted this:
* on different endpoints: statuses/mentions, statuses/user_timeline, search, users/lookup
* using different libraries: urllib2 returns an invalid json response, and httplib raises an IncompleteRead error
* and on different environments: Amazon EC2 servers (running Ubuntu 10), as well as localhost (running something else).
Could this be a Twitter API issue, or something on our end?

Replies
I'm investigating similar reports to this issue -- we're having trouble reproducing. Is there any way you can capture the HTTP headers you're sending and the HTTP headers you get in response for a request that exhibits this behavior?
Yes, I think we can do that. I'll come back with some headers :)
Here's some header examples:
Request headers:
{'Host': 'search.twitter.com',
'User-agent': 'Python-urllib/2.6',
'Connection': 'close'};
Response headers:
['Cache-Control: max-age=15, must-revalidate, max-age=300\r\n',
'Expires: Thu, 26 Jul 2012 07:07:03 GMT\r\n',
'Content-Type: application/json;charset=utf-8\r\n',
'X-Transaction: d6834a10b953e1a1\r\n',
'X-Frame-Options: SAMEORIGIN\r\n',
'Content-Length: 66658\r\n',
'Vary: Accept-Encoding\r\n',
'Date: Thu, 26 Jul 2012 07:02:03 GMT\r\n',
'X-Varnish: 68396455\r\n',
'Age: 0\r\n',
'Server: tfe\r\n',
'X-Cache: MISS from somehost.host\r\n',
'X-Cache-Lookup: MISS from somehost.host:3128\r\n',
'Via: 1.1 varnish, 1.0 somehost.host:3128 (squid/2.7.STABLE7)\r\n',
'Connection: close\r\n']
Thank you for your help, I'll submit this to the investigators.
Appears my problem is a dupe of this too. Sorry about reposting.
https://dev.twitter.com/discussions/9636
Hi David,
Can you share a request and response cycle exhibiting this problem much as @mihneagiurgea has done above? Very useful in getting to the bottom of this. Thanks!
I have the same problem, using the Streaming API endpoints 'statuses/filter' and 'statuses/sample'.
I use node.js to connect, read the JSON-chunks and currently simply store the recieved messages in MySQL.
When using the 'status/filter' endpoint I store about 6.64 tweets per second. The error-rate is about
0.045 (error/total).
(Running time: 12 h - Messages: total 286881 / limit 2 / tweet 274041 / error 12838)
When using the 'status/sample' endpoint, I store about 54.29 tweets per second. The error-rate is about 0.037 (error/total).
(Running Time: 1 h - Messages: total 195458 / delete 15593 / tweet 172541 / error 7324)
The choped JSON-Chunks look like this:
and
There seems to be no pattern to where the messages are chopped.
I did the tests at home with only 16 mBit/s downstream.
To check if it is a bandwidth problem, I will run the tests on a server with higher bandwidth and report back, when I am done with that.
I will also see if I can supply you with some headers.
I ran the 'status/sample' stream for an hour on the server. Looks still like the same problem.
I think I found the problem on my side.
As far as I understand the networking stuff, an http-stream might be fragmented.
The parser (https://github.com/ttezel/twit), I use, assumes that each JSON-Object is delivered as whole object and is not fragmented.
It also does not make use of the 'delimited' parameter, when connecting to the stream.
If a JSON-Object is fragmented, the parser does not parse the fragments as combined chunk and also does not save the fragment for later.
There for it fails to recognize the chunk is distributed over 2 packets.
Can anyone using another library confirm this behaviour?
I try to fix that in the code I use and report back.
We've been hitting the same issue multiple times a day during regular imports of tweets over the last week. It didn't happen prior to that. Regrettably don't have an easy way of capturing the request/response headers, but just wanted to affirm that we're getting these malformed responses as well.
My previous posting is still in moderation, since I posted a link to GitHub, I guess.
My problem is solved, there was a fix for the problem in one of the parsers branches.
In short, until the moderators approve my rather long posting:
The parser, I used, did not check for fragmented JSON-Objects, that caused the bogus behaviour.
Any updates on this? Have you guys managed to confirm that this is indeed a Twitter API issue?
I still need to get logging of metadata from requests where it fails to twitter. This is not trivial for us as it means deploying new code to a high volume site :)
I get the same problem with malformed atom-responses.
The search line used is:
http://search.twitter.com/search.atom?lang=sv&q=+ahlens&rpp=100
This is when it fails and returns less data than it is supposed to.
And this is how it should look requested about 10 seconds after the first malformed one.
<?xml version="1.0" encoding="UTF-8"?>As far as I'm concerned these errors occur randomly. I have not been able to detect a pattern.
Thank you for your details!
OK, here's some response headers. From a call to friends_timeline.
In python I did "len(json)" and got this number: 119528
This doesn't seem to match the content-length in the response headers though.
{'Status': '200 OK', 'X-Ratelimit-Remaining': '314', 'X-Transaction': '94cd0074f6a44297', 'Content-Encoding': 'gzip', 'Set-Cookie': 'guest_id="v1:134397897415680524";Expires=Sun, 3-Aug-2014 07:29:34 GMT;Path=/;Domain=.twitter.com,lang=en', 'Expires': 'Tue, 31 Mar 1981 05:00:00 GMT', 'X-Access-Level': 'read-write-directmessages', 'Content-Length': '16402', 'Server': 'tfe', 'Last-Modified': 'Fri, 03 Aug 2012 07:29:34 GMT', 'X-Ratelimit-Limit': '350', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache, no-store, must-revalidate, pre-check=0, post-check=0', 'Date': 'Fri, 03 Aug 2012 07:29:34 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Type': 'application/json;charset=utf-8', 'X-Ratelimit-Class': 'api_identified', 'X-Ratelimit-Reset': '1343981440'}
Here's another one with full url and headers.
I'm also having troubles with incomplete JSON responses recently. I didn't modify my code, but it keeps crashing due to that error. So I assume that this isn't an issue on my side…