Using User Stream to get older tweets

anfreymann
@anfreymann Andreas Freymann

I am programming an application which collects special tweets containing a special keyword using Streaming API (Firehose).
My application is divided into three steps:

1) Get all Tweets using a special keyword using Streaming API (Firehose)
2) Get all Users which are contained in the collected tweets
3) Get all other tweets of the User (tweets of the last days + the future tweets which the user will tweet)

HOW IT WORKS:
1) This works pretty well.
2) This works also well.
3) There I have a real problem.
In my database I have all the appropriate users and with a loop I check in a background process if the past tweets (max 3200 tweets) has been collected.

If the past tweets have not been collected yet I do this with the api.twitter using the user_id
(https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&user_id="XYZ"&count=3200")

If the past tweets have already been collected I use the api.twitter using the since_id argument to get the tweets since the "since_id"
(https://api.twitter.com/1/statuses/user_timeline.json?include_entities=true&include_rts=true&user_id="XYZ"&since_id="XYZ")

PROBLEM:
Step three is implemented with a background process which collect all tweets (the last 3200 + the current and real-time tweets) from each user which is stored in the database. The problem I have is that I am using the api.twitter to get the tweets for a user. So if I would have thousands of users I would have thousands if requests using api.twitter, but there is a limit of 150 request per hour. This is the problem I have.

QUESTION:
Is there a way to use the User Stream to get user's old tweets from a time frame using since_id?

Thanks!

11 weeks 10 hours ago

Replies

episod
@episod Taylor Singletary

User Streams is really only about "right now" -- it doesn't serve tweets from the past.

Rate limiting is a fact of life in the Twitter API. If you want to work on behalf of users to fetch their timelines, you'll want to leverage their explicit authorization through OAuth so that you can perform 350 requests per hour directly on their behalf.

Keep in mind that the user timeline has a limit of 200 tweets per request and that the paging metaphor is about to be deprecated. You should begin using a combination of since_id, max_id, and count on user timeline without the usage of "page."

11 weeks 10 hours ago