can only fetch 372 posts (instead of 3200)

FunSizeTJ
@FunSizeTJ Ticklish Junk

I am trying to access my last 3200 tweets, which I'm supposed to be able to do, right?

I started off with

twurl -t "/1/statuses/user_timeline.xml?screen_name=FunSizeTJ&count=200

and then used 'max_id' until this:

twurl -t "/1/statuses/user_timeline.xml?screen_name=FunSizeTJ&count=200&max_id=8948999510"

but I can't get anything older than that one, which is only 372 of 3200.

I tried going back in a web browser, and it stopped at that exact same one.

(I also tried "page" instead of "max_id" but that didn't work any better.)

edited to add - oh, and I'm nowhere near my quota, so it's not that…

44 weeks 4 days ago

Replies

episod
@episod Taylor Singletary

You should also request that retweets be included, as they count toward the 3,200 you're fetching. You'll find that when you're walking a timeline of known length, and not just searching for specific ranges, it's much easier to just paginate with page & count together, no max_id.

3200/200 = 16, so the last page of available results for you is going to likely be the 16th.

You could then walk through the whole set in 16 requests, oldest to most recent:

  1. twurl -t "/1/statuses/user_timeline.xml?screen_name=FunSizeTJ&count=200&include_rts=true&page=16"
  2. twurl -t "/1/statuses/user_timeline.xml?screen_name=FunSizeTJ&count=200&include_rts=true&page=15"
  3. twurl -t "/1/statuses/user_timeline.xml?screen_name=FunSizeTJ&count=200&include_rts=true&page=14"
  4. twurl -t "/1/statuses/user_timeline.xml?screen_name=FunSizeTJ&count=200&include_rts=true&page=13"
  5. twurl -t "/1/statuses/user_timeline.xml?screen_name=FunSizeTJ&count=200&include_rts=true&page=12"

And so on...

44 weeks 4 days ago
FunSizeTJ
@FunSizeTJ Ticklish Junk

I thought that 'page' was going to be deprecated, so I stopped using it.

44 weeks 4 days ago
episod
@episod Taylor Singletary

The page parameter is really only "deprecated" for methods that use cursors instead -- a more efficient way for our backend to process that kind of data. max_id and since_id can be used in tandem with page fine, but in this particular use case, it's much easier to just paginate more traditionally.

44 weeks 4 days ago
FunSizeTJ
@FunSizeTJ Ticklish Junk

Agreed that pagination is much simpler.

However, include_rts was not the answer, which I knew because I don't retweet that often, and certainly not enough to make up the difference between 372 fetched vs 3200.

I did as you suggested and ran:

for COUNT in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
do

twurl -t "/1/statuses/user_timeline.xml?screen_name=FUNSIZETJ&count=200&include_rts=true&page=$COUNT" > $COUNT.xml

((COUNT++))

done

(The script also checks to make sure that it got "Status: 200 OK" before incrementing COUNT, but that's the gist of it)

I then checked the $COUNT.xml files for ''

Here's how many tweets I got back

1.xml: 170
2.xml: 200
3.xml: 120
4.xml: 133
5.xml: 77
6.xml: 47
7.xml: 50
8.xml: 75
9.xml: 9
10.xml: 20
11.xml: 25
12.xml: 37
13.xml: 19
14.xml: 34
15.xml: 46
16.xml: 41
17.xml: 0

(18-25 were also empty, I just ran it to 25 to make sure twitter wouldn't send anything more than 16)

If you add up all times that '' is found, it looks like 1103.

BUT, there are duplicates. A lot of duplicates.

cat *.xml | fgrep '<text>' | sort -u | wc -l

reports 372

Which is the exact same number as I get through the website.

Something is wrong here.

ps - in that 'twurl' line above, I also tried json, rss, and atom (to see if they gave any different results than xml) but they did not.

44 weeks 4 days ago
episod
@episod Taylor Singletary

Some accounts can get in a bad caching state with so-called "truncated timelines" -- yours might be a case of this. Follow up with support to see if they can relieve your account's issue: https://support.twitter.com/forms

44 weeks 4 days ago
FunSizeTJ
@FunSizeTJ Ticklish Junk

Will try that. Thanks.

44 weeks 4 days ago
kieronam
@kieronam Kieron Merrett

I'm also having this problem. Did you get it solved?

My tweet count is over 10,000 but I try returning 16xpages of 200 tweets from my userTimeline and don't get anything near 3200 tweets.

27 weeks 2 hours ago
kieronam
@kieronam Kieron Merrett

Is it the case that we should get 3200 statuses back (in a number of pages) regardless of how old they are? I am having the same problem, retrieving 16 pages of 200 tweets from the userTimeline, but I don't actually get anything like 200 tweets back per page (and this is with RTs enabled).

27 weeks 2 hours ago
episod
@episod Taylor Singletary

Some timelines become corrupted due to caching issues over time, especially if you've ever deleted tweets. If you believe this may be a corruption issue on your timeline, contact @Support through: https://support.twitter.com/forms/general

26 weeks 6 days ago
kieronam
@kieronam Kieron Merrett

Thanks for your advice!

I contacted support and they have just replied and told me I should check on the forums :-) I will reply and refer them back to this.

25 weeks 6 days ago
kieronam
@kieronam Kieron Merrett

Hi again Taylor,

I'd be grateful for your advice on how to tackle this. I'm keen to get this timeline caching issue resolved, but when I contact support they just don't seem to believe me that this an issue for them. They are just referring me to the API documentation and these discussion forums!

Whether I use the API or simply scroll down my timeline on twitter.com, only my most recent tweets are showing - only a few hundred tweets, far fewer than 3200 - even though my tweet count is above 10,000.

Thanks for your help!

11 weeks 15 hours ago
episod
@episod Taylor Singletary

It's unfortunate that they did not assist you -- sorry about that. I did the equivalent of hitting the side of the television set and it looks like you can walk your timeline via the API now (which means it should also be possible on the site). Cheers!

11 weeks 13 hours ago
FunSizeTJ
@FunSizeTJ Ticklish Junk

I tried backing up a different account, and I only got 372 there too.

I put my script here

http://dl.dropbox.com/u/18414/tmp/2011-09-30-Twitter-API/backuptweets-using-pages.zsh

so you can see how I'm doing it. Maybe I'm doing something wrong, but I don't know what.

44 weeks 4 days ago
MatBoy007
@MatBoy007 MatBoy

Why would you have multiple pages instead of one ? I'm trying to do this using PHP, grab all tweets up to the last 3200..

42 weeks 3 days ago