Этот код работал для меня.

Question

Jul 06, 2017, 02:16 PM

Этот код работал для меня.

аюсь взять все открытые твиты в хэштег, но мой код не идет дальше, чем 299 твитов.

Я также пытаюсь получать твиты из определенной временной шкалы, такие как твиты, только в мае 2015 года и июле 2016 года. Есть ли способ сделать это в основном процессе или я должен написать небольшой код для него?

Вот мой код:

# if this is the first time, creates a new array which
# will store max id of the tweets for each keyword
if not os.path.isfile("max_ids.npy"):
    max_ids = np.empty(len(keywords))
    # every value is initialized as -1 in order to start from the beginning the first time program run
    max_ids.fill(-1)
else:
    max_ids = np.load("max_ids.npy")  # loads the previous max ids

# if there is any new keywords added, extends the max_ids array in order to correspond every keyword
if len(keywords) > len(max_ids):
    new_indexes = np.empty(len(keywords) - len(max_ids))
    new_indexes.fill(-1)
    max_ids = np.append(arr=max_ids, values=new_indexes)

count = 0
for i in range(len(keywords)):
    since_date="2015-01-01"
    sinceId = None
    tweetCount = 0
    maxTweets = 5000000000000000000000  # maximum tweets to find per keyword
    tweetsPerQry = 100
    searchQuery = "#{0}".format(keywords[i])
    while tweetCount < maxTweets:
        if max_ids[i] < 0:
                if (not sinceId):
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry)
                else:
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            since_id=sinceId)
        else:
                if (not sinceId):
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            max_id=str(max_ids - 1))
                else:
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            max_id=str(max_ids - 1),
                                            since_id=sinceId)
        if not new_tweets:
            print("Keyword: {0}      No more tweets found".format(searchQuery))
            break
        for tweet in new_tweets:
            count += 1
            print(count)

            file_write.write(
                       .
                       .
                       .
                         )

            item = {
                .
                .
                .
                .
                .
            }

            # instead of using mongo's id for _id, using tweet's id
            raw_data = tweet._json
            raw_data["_id"] = tweet.id
            raw_data.pop("id", None)

            try:
                db["Tweets"].insert_one(item)
            except pymongo.errors.DuplicateKeyError as e:
                print("Already exists in 'Tweets' collection.")
            try:
                db["RawTweets"].insert_one(raw_data)
            except pymongo.errors.DuplicateKeyError as e:
                print("Already exists in 'RawTweets' collection.")

        tweetCount += len(new_tweets)
        print("Downloaded {0} tweets".format(tweetCount))
        max_ids[i] = new_tweets[-1].id

np.save(arr=max_ids, file="max_ids.npy")  # saving in order to continue mining from where left next time program run

Этот код работал для меня.

Ответы на вопрос(0)

Ваш ответ на вопрос

Популярные вопросы

Вы очень активны! Это здорово!

Этот код работал для меня.

Ответы на вопрос(0)

Ваш ответ на вопрос

Популярные вопросы