Twitter tweets scrapping model (Fall 2022)
[Notice] Journey to the academic researcher This is the story of how I became the insightful researcher.
This code can scrape the tweets’ data from the Twitter server. Before implementing this code, users need to register their accounts to developer accounts and get various keys to free access to the server. Users should explain their purposes for using Twitter API to get the keys. If they sufficiently explain their goals or the goals are not suitable to utilize the API, Twitter will deny their registration. Users without authorization can not use all API functions and will be strictly restricted by Twitter. Furthermore, even though users get full authorization, they only can request 900 data per 15mins to mitigate Twitter servers’ overworks.
import pandas as pd
import tweepy
import ssl
from tqdm import tqdm
ssl._create_default_https_context = ssl._create_unverified_context
import time
consumer_key = "ENTER YOUR KEY"
consumer_secret = "ENTER YOUR KEY"
access_key = "ENTER YOUR KEY"
access_secret = "ENTER YOUR KEY"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
client = tweepy.Client(bearer_token = 'ENTER YOUR KEY', wait_on_rate_limit=True)
Search_all_tweets version
database = pd.read_excel('Twitter_Official_1009_update22.xlsx', sheet_name='base')
database
news_date | End | Start | Official Account | Add | Unnamed: 5 | |
---|---|---|---|---|---|---|
0 | 2018-02-27 | 2021-06-22 | 2017-08-27 | cvspharmacy | 2019-05-03 | True |
1 | 2018-04-03 | 2020-10-21 | 2017-10-03 | dominos | 2019-05-07 | True |
2 | 2018-04-10 | 2021-06-17 | 2017-10-10 | Nordstrom | 2018-03-29 | True |
3 | 2018-02-01 | 2021-06-16 | 2017-08-01 | kroger | 2019-03-24 | True |
4 | 2018-02-23 | 2021-05-10 | 2017-08-23 | Kohls | 2018-02-13 | True |
5 | 2018-02-01 | 2021-06-09 | 2017-08-01 | Lowes | 2019-10-23 | True |
6 | 2019-10-14 | 2021-03-12 | 2019-04-14 | lululemon | 2019-09-28 | True |
7 | 2018-02-12 | 2021-06-30 | 2017-08-12 | Starbucks | 2019-09-02 | True |
8 | 2018-01-23 | 2021-06-21 | 2017-07-23 | ATT | 2019-06-25 | True |
9 | 2018-02-05 | 2021-06-08 | 2017-08-05 | TDBank_US | 2018-07-13 | True |
10 | 2018-01-24 | 2021-06-30 | 2017-07-24 | Target | 2017-11-09 | True |
11 | 2018-02-06 | 2021-03-24 | 2017-08-06 | Tmobile | 2018-09-09 | True |
12 | 2018-02-13 | 2020-10-21 | 2017-08-13 | ultabeauty | 2017-09-30 | True |
13 | 2018-03-15 | 2020-08-11 | 2017-09-15 | Walmart | 2018-12-19 | True |
column_names = ["Account","gen_date", "text", "in_url", "in_media", "hash_text", "hash_count", "ret_count", "fav_count"]
tw_data = pd.DataFrame(columns = column_names)
tw_data
Account | gen_date | text | in_url | in_media | hash_text | hash_count | ret_count | fav_count |
---|
req_counter = 0
for idx, data in tqdm(database.iterrows()):
tweet_ex = client.search_all_tweets(query="from:"+data['Official Account']+" lang:en -is:retweet -is:reply -is:quote -is:nullcast", \
start_time=data['Start'], end_time=data['Add'], max_results=500)
acc_name = data['Official Account']
if tweet_ex[0] == None:
continue
for tweet in tweet_ex[0]:
# check whether the number of requests arrived at the limit
req_counter += 1
if req_counter == 900:
time.sleep(901)
req_counter = 0
target_id = tweet.id
ex_stat = api.get_status(target_id)
gen_date = ex_stat._json['created_at']
cont = ex_stat._json['text']
try:
urls = ex_stat.entities['urls'][0]
in_url = urls['url']
except:
in_url = "No URL included"
try:
media = ex_stat.entities['media'][0]
in_media = media['type']
except:
in_media = "No Media included"
hash_cont = []
for hashtag in ex_stat.entities['hashtags']:
hash_cont.append(hashtag['text'])
hash_num = len(ex_stat.entities['hashtags'])
ret_num = ex_stat._json['retweet_count']
fav_num = ex_stat._json['favorite_count']
# for i in client.search_all_tweets(query="in_reply_to_status_id: "+str(target_id), \
# start_time=data['Start'], end_time=data['End'], max_results=200):
# rep_counter += 1
# rep_num = rep_counter
rows = [acc_name, gen_date, cont, in_url, in_media, hash_cont, hash_num, ret_num, fav_num]
tw_data.loc[len(tw_data)] = rows
14it [04:59, 21.43s/it]
tw_data
Account | gen_date | text | in_url | in_media | hash_text | hash_count | ret_count | fav_count | |
---|---|---|---|---|---|---|---|---|---|
0 | cvspharmacy | Tue Jun 25 13:30:32 +0000 2019 | Using @SpaRoomProducts' therapeutic 100% Pure ... | https://t.co/Qlq5WI3pFD | No Media included | [] | 0 | 0 | 6 |
1 | cvspharmacy | Fri Jun 21 19:38:29 +0000 2019 | ☀️ 🖍️ Kick off summer with our free coloring ... | https://t.co/vdzONMLBRp | photo | [] | 0 | 3 | 13 |
2 | cvspharmacy | Wed Jun 19 13:34:45 +0000 2019 | Up to 50% of Americans don’t take their medica... | https://t.co/2M2B3fK3yF | No Media included | [] | 0 | 4 | 15 |
3 | cvspharmacy | Tue Jun 18 15:30:52 +0000 2019 | Slide into the season without sneezing. We del... | https://t.co/CszOREMaZb | No Media included | [] | 0 | 3 | 7 |
4 | cvspharmacy | Sun Jun 16 13:30:01 +0000 2019 | Thanks for everything you do, dads! Happy Fath... | No URL included | photo | [] | 0 | 5 | 12 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
474 | Walmart | Fri Dec 21 18:49:02 +0000 2018 | Not only is @MartinaMcBride a country supersta... | https://t.co/Gh4XcaI0ow | No Media included | [] | 0 | 6 | 22 |
475 | Walmart | Fri Dec 21 17:52:32 +0000 2018 | If you know, you know. https://t.co/F94z95mCGT | https://t.co/F94z95mCGT | No Media included | [] | 0 | 12 | 52 |
476 | Walmart | Thu Dec 20 21:02:11 +0000 2018 | In case Santa doesn’t get your letter, just se... | https://t.co/aXOWOFVH72 | No Media included | [] | 0 | 11 | 33 |
477 | Walmart | Thu Dec 20 20:07:34 +0000 2018 | Note to self: check for Cheeto-fingers before ... | https://t.co/MvG0Sa9Y9m | No Media included | [WalmartTopSeller] | 1 | 5 | 18 |
478 | Walmart | Thu Dec 20 18:08:46 +0000 2018 | It’s not over ‘til it’s over. And our 20 Days ... | https://t.co/JarvS9CPuP | No Media included | [] | 0 | 3 | 22 |
479 rows × 9 columns
tw_data.to_csv("FinalCheck_cycle22.csv")
PAGINATION & Search all tweets
database = pd.read_excel('Twitter_Official_1012_update25.xlsx', sheet_name='base')
database
news_date | End | Start | Official Account | Add | Unnamed: 5 | |
---|---|---|---|---|---|---|
0 | 2018-02-15 | 2018-08-15 | 2017-08-15 | AMTDGroup | 2018-01-25 | True |
1 | 2018-04-06 | 2021-04-07 | 2017-10-06 | BestBuy | 2017-11-15 | True |
2 | 2018-02-22 | 2021-01-01 | 2017-08-22 | Avis | 2020-05-07 | True |
3 | 2018-02-06 | 2021-06-07 | 2017-08-06 | ChipotleTweets | 2021-03-24 | True |
4 | 2018-04-20 | 2019-07-15 | 2017-10-20 | Designer_Brands | 2018-06-19 | True |
5 | 2018-08-29 | 2021-05-18 | 2018-02-28 | DollarGeneral | 2018-03-18 | True |
6 | 2018-03-01 | 2019-12-18 | 2017-09-01 | darden | 2017-09-12 | True |
7 | 2020-10-04 | 2021-06-22 | 2020-04-04 | darden | 2021-01-26 | True |
8 | 2018-05-27 | 2021-06-28 | 2017-11-27 | Ford | 2021-03-07 | True |
9 | 2018-09-05 | 2019-07-15 | 2018-03-05 | FastenalCompany | 2018-04-01 | True |
10 | 2019-01-08 | 2020-06-21 | 2018-07-08 | footlocker | 2018-08-13 | True |
11 | 2018-07-06 | 2019-12-13 | 2018-01-06 | Genesco_Inc | 2019-04-30 | True |
12 | 2018-01-25 | 2020-10-02 | 2017-07-25 | Hyatt | 2017-08-15 | True |
13 | 2018-10-18 | 2019-04-18 | 2018-04-18 | habitburger | 2019-04-18 | True |
14 | 2018-04-25 | 2019-07-15 | 2017-10-25 | HDSupply | 2017-11-01 | True |
15 | 2018-01-11 | 2021-02-27 | 2017-07-11 | HiltonHotels | 2021-02-27 | True |
16 | 2020-04-27 | 2021-02-03 | 2019-10-27 | HRBlock | 2019-11-13 | True |
17 | 2018-08-23 | 2021-06-01 | 2018-02-23 | HSBC | 2018-03-07 | True |
18 | 2018-07-17 | 2021-01-02 | 2018-01-17 | Labcorp | 2020-03-19 | True |
19 | 2019-11-12 | 2020-05-12 | 2019-05-12 | ElPolloLoco | 2019-07-06 | True |
20 | 2018-01-18 | 2021-06-10 | 2017-07-18 | lukoilengl | 2020-03-03 | True |
21 | 2018-02-05 | 2018-08-05 | 2017-08-05 | lululemon | 2017-08-13 | True |
22 | 2018-04-01 | 2021-04-05 | 2017-10-01 | Macys | 2020-08-31 | True |
23 | 2018-03-09 | 2021-06-04 | 2017-09-09 | Marriott | 2017-10-23 | True |
24 | 2018-01-17 | 2019-07-15 | 2017-07-17 | MurphyUSA | 2017-08-09 | True |
25 | 2018-01-17 | 2019-02-27 | 2017-08-31 | MurphyUSA | 2019-02-27 | True |
26 | 2020-08-10 | 2021-03-09 | 2020-02-10 | MurphyUSA | 2020-10-02 | True |
27 | 2018-03-15 | 2021-06-29 | 2017-09-15 | Nike | 2021-06-29 | True |
28 | 2019-04-18 | 2019-10-18 | 2018-10-18 | OlliesOutlet | 2018-12-19 | True |
29 | 2019-07-26 | 2020-01-26 | 2019-01-26 | BankOZK | 2019-02-18 | True |
30 | 2018-04-02 | 2021-05-19 | 2017-10-02 | PolarisInc | 2017-10-09 | True |
31 | 2018-04-10 | 2021-06-22 | 2017-10-10 | childrensplace | 2018-09-23 | True |
32 | 2018-11-30 | 2019-09-06 | 2018-05-30 | PlanetFitness | 2019-03-31 | True |
33 | 2018-02-01 | 2021-06-07 | 2017-08-01 | PNCBank | 2017-08-30 | True |
34 | 2018-01-26 | 2021-05-06 | 2017-07-26 | PVHCorp | 2018-05-18 | True |
35 | 2018-03-19 | 2018-10-20 | 2018-03-19 | SportsmansWH | 2018-10-20 | True |
36 | 2019-05-08 | 2020-12-10 | 2018-11-08 | SportsmansWH | 2019-10-29 | True |
37 | 2018-03-07 | 2020-05-04 | 2017-09-07 | TruistNews | 2020-01-29 | True |
38 | 2020-12-02 | 2021-06-02 | 2020-06-02 | DelTaco | 2020-06-24 | True |
39 | 2018-03-22 | 2019-07-15 | 2017-09-22 | TractorSupply | 2018-04-29 | True |
40 | 2018-03-15 | 2021-04-04 | 2017-09-15 | Wendys | 2020-02-19 | True |
41 | 2018-01-11 | 2021-02-14 | 2017-07-11 | WolverineWW | 2017-09-07 | True |
42 | 2019-05-08 | 2020-12-10 | 2018-11-08 | SportsmansWH | 2019-10-29 | True |
43 | 2018-01-25 | 2021-06-22 | 2017-07-25 | riteaid | 2018-03-21 | True |
column_names = ["Account","gen_date", "text", "in_url", "in_media", "hash_text", "hash_count", "ret_count", "fav_count"]
tw_data = pd.DataFrame(columns = column_names)
tw_data
Account | gen_date | text | in_url | in_media | hash_text | hash_count | ret_count | fav_count |
---|
req_counter = 0
for idx, data in tqdm(database.iterrows()):
acc_name = data['Official Account']
for tweet in tweepy.Paginator(client.search_all_tweets, query= "from: "+data['Official Account']+ " lang:en -is:retweet -is:reply -is:quote -is:nullcast",
start_time=data['Start'], end_time=data['Add'], max_results=500).flatten(limit=2000):
req_counter += 1
if tweet == None:
continue
if req_counter == 900:
time.sleep(901)
req_counter = 0
target_id = tweet.id
ex_stat = api.get_status(target_id)
gen_date = ex_stat._json['created_at']
cont = ex_stat._json['text']
try:
urls = ex_stat.entities['urls'][0]
in_url = urls['url']
except:
in_url = "No URL included"
try:
media = ex_stat.entities['media'][0]
in_media = media['type']
except:
in_media = "No Media included"
hash_cont = []
for hashtag in ex_stat.entities['hashtags']:
hash_cont.append(hashtag['text'])
hash_num = len(ex_stat.entities['hashtags'])
ret_num = ex_stat._json['retweet_count']
fav_num = ex_stat._json['favorite_count']
rows = [acc_name, gen_date, cont, in_url, in_media, hash_cont, hash_num, ret_num, fav_num]
tw_data.loc[len(tw_data)] = rows
1it [00:00, 5.46it/s]Rate limit exceeded. Sleeping for 900 seconds.
4it [36:07, 665.03s/it]Rate limit exceeded. Sleeping for 822 seconds.
6it [49:50, 530.64s/it]Rate limit exceeded. Sleeping for 900 seconds.
8it [1:04:51, 495.97s/it]Rate limit exceeded. Sleeping for 900 seconds.
Rate limit exceeded. Sleeping for 897 seconds.
9it [1:34:52, 803.19s/it]Rate limit exceeded. Sleeping for 899 seconds.
10it [1:49:51, 827.37s/it]Rate limit exceeded. Sleeping for 901 seconds.
11it [2:05:37, 858.57s/it]Rate limit exceeded. Sleeping for 856 seconds.
13it [2:19:53, 673.35s/it]Rate limit exceeded. Sleeping for 900 seconds.
14it [2:34:54, 726.88s/it]Rate limit exceeded. Sleeping for 900 seconds.
15it [2:49:54, 770.48s/it]Rate limit exceeded. Sleeping for 901 seconds.
16it [3:06:34, 830.94s/it]Rate limit exceeded. Sleeping for 802 seconds.
18it [3:19:56, 646.05s/it]Rate limit exceeded. Sleeping for 900 seconds.
19it [3:34:56, 705.78s/it]Rate limit exceeded. Sleeping for 901 seconds.
20it [3:49:57, 754.91s/it]Rate limit exceeded. Sleeping for 901 seconds.
24it [4:24:02, 524.36s/it]Rate limit exceeded. Sleeping for 705 seconds.
25it [4:35:47, 576.91s/it]Rate limit exceeded. Sleeping for 901 seconds.
26it [4:50:49, 672.14s/it]Rate limit exceeded. Sleeping for 900 seconds.
28it [5:06:14, 576.69s/it]Rate limit exceeded. Sleeping for 876 seconds.
30it [5:20:51, 522.05s/it]Rate limit exceeded. Sleeping for 900 seconds.
32it [5:35:51, 496.16s/it]Rate limit exceeded. Sleeping for 901 seconds.
33it [5:50:55, 579.58s/it]Rate limit exceeded. Sleeping for 898 seconds.
35it [6:05:53, 531.46s/it]Rate limit exceeded. Sleeping for 901 seconds.
36it [6:20:54, 608.61s/it]Rate limit exceeded. Sleeping for 901 seconds.
37it [6:35:56, 675.80s/it]Rate limit exceeded. Sleeping for 900 seconds.
38it [6:50:56, 731.23s/it]Rate limit exceeded. Sleeping for 901 seconds.
39it [7:05:57, 775.58s/it]Rate limit exceeded. Sleeping for 901 seconds.
40it [7:20:58, 809.71s/it]Rate limit exceeded. Sleeping for 901 seconds.
41it [7:52:41, 1115.17s/it]Rate limit exceeded. Sleeping for 850 seconds.
43it [8:06:52, 808.84s/it] Rate limit exceeded. Sleeping for 900 seconds.
44it [8:22:14, 684.88s/it]
tw_data
Account | gen_date | text | in_url | in_media | hash_text | hash_count | ret_count | fav_count | |
---|---|---|---|---|---|---|---|---|---|
0 | BestBuy | Tue Nov 14 18:59:07 +0000 2017 | Gear up.\n\nGet the #StarWarsBattlefrontII Eli... | https://t.co/qaVjKvyqv6 | No Media included | [StarWarsBattlefrontII] | 1 | 30 | 61 |
1 | BestBuy | Tue Nov 14 15:00:01 +0000 2017 | .@saradietschy proves that no matter how big o... | No URL included | No Media included | [] | 0 | 21 | 129 |
2 | BestBuy | Mon Nov 13 18:00:12 +0000 2017 | They’ll be dashing like Dasher and dancing lik... | https://t.co/bpibw0wXFH | No Media included | [] | 0 | 19 | 39 |
3 | BestBuy | Mon Nov 13 15:00:09 +0000 2017 | Got a fav song on JAY-Z’s 4:44 album?\nTell us... | https://t.co/AuwgDRsUjB | No Media included | [BestBuyTicketsNY, Sweepstakes] | 2 | 27 | 93 |
4 | BestBuy | Sun Nov 12 15:00:09 +0000 2017 | The AMD Ryzen Processor with Radeon Vega Graph... | https://t.co/2gOQRxYv6h | No Media included | [] | 0 | 26 | 52 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2954 | riteaid | Mon Aug 07 01:01:52 +0000 2017 | The DreamShip takes flight at the @3RiversRega... | https://t.co/wuKidTA8vY | No Media included | [] | 0 | 5 | 11 |
2955 | riteaid | Sun Aug 06 12:00:02 +0000 2017 | Why wait? Get your flu shot now through August... | No URL included | photo | [] | 0 | 4 | 8 |
2956 | riteaid | Fri Aug 04 13:30:08 +0000 2017 | The more points you earn with wellness+Plenti,... | https://t.co/qQhIuoId6G | No Media included | [] | 0 | 5 | 5 |
2957 | riteaid | Wed Aug 02 20:54:03 +0000 2017 | It's never too early to get your flu shot, so ... | https://t.co/uvjriPn3iv | No Media included | [] | 0 | 5 | 9 |
2958 | riteaid | Tue Jul 25 12:00:03 +0000 2017 | Make healthy eating fun with a picnic lunch of... | No URL included | photo | [] | 0 | 2 | 4 |
2959 rows × 9 columns
tw_data.to_csv("FinalCheck_cycle25.csv")
PAGINATION SECTION for one account
column_names = ["Account","gen_date", "text", "in_url", "in_media", "hash_text", "hash_count", "ret_count", "fav_count"]
tw_data = pd.DataFrame(columns = column_names)
tw_data
start_time = "2017-07-25"+"T00:00:01Z"
end_time = "2018-03-21"+"T00:00:01Z"
acc_name = "riteaid"
req_counter = 0
for tweet in tqdm(tweepy.Paginator(client.search_all_tweets, query= "from: "+acc_name+ " lang:en -is:retweet -is:reply -is:quote -is:nullcast",
start_time=start_time, end_time=end_time, max_results=500).flatten(limit=2000)):
req_counter += 1
if req_counter == 900:
time.sleep(901)
req_counter = 0
target_id = tweet.id
ex_stat = api.get_status(target_id)
gen_date = ex_stat._json['created_at']
cont = ex_stat._json['text']
try:
urls = ex_stat.entities['urls'][0]
in_url = urls['url']
except:
in_url = "No URL included"
try:
media = ex_stat.entities['media'][0]
in_media = media['type']
except:
in_media = "No Media included"
hash_cont = []
for hashtag in ex_stat.entities['hashtags']:
hash_cont.append(hashtag['text'])
hash_num = len(ex_stat.entities['hashtags'])
ret_num = ex_stat._json['retweet_count']
fav_num = ex_stat._json['favorite_count']
# for i in tweepy.Paginator(client.search_all_tweets, query="in_reply_to_status_id: "+str(target_id),
# start_time=start_time, end_time=end_time, max_results=500).flatten(limit=100000):
# for i in client.search_all_tweets(query="in_reply_to_status_id: "+str(target_id), start_time=start_time, end_time=end_time, max_results=200):
# idx += 1
# rep_num = idx
rows = [acc_name, gen_date, cont, in_url, in_media, hash_cont, hash_num, ret_num, fav_num]
tw_data.loc[len(tw_data)] = rows
tw_data
tw_data.to_csv("FinalCheck_riteaid.csv")
column_names = ["Account","gen_date", "text", "in_url", "in_media", "hash_text", "hash_count", "ret_count", "fav_count"]
tw_data = pd.DataFrame(columns = column_names)
tw_data
start_time = "2018-11-08"+"T00:00:01Z"
end_time = "2019-10-29"+"T00:00:01Z"
acc_name = "SportsmansWH"
req_counter = 0
for tweet in tqdm(tweepy.Paginator(client.search_all_tweets, query= "from: "+acc_name+ " lang:en -is:retweet -is:reply -is:quote -is:nullcast",
start_time=start_time, end_time=end_time, max_results=500).flatten(limit=2000)):
req_counter += 1
if req_counter == 900:
time.sleep(901)
req_counter = 0
target_id = tweet.id
ex_stat = api.get_status(target_id)
gen_date = ex_stat._json['created_at']
cont = ex_stat._json['text']
try:
urls = ex_stat.entities['urls'][0]
in_url = urls['url']
except:
in_url = "No URL included"
try:
media = ex_stat.entities['media'][0]
in_media = media['type']
except:
in_media = "No Media included"
hash_cont = []
for hashtag in ex_stat.entities['hashtags']:
hash_cont.append(hashtag['text'])
hash_num = len(ex_stat.entities['hashtags'])
ret_num = ex_stat._json['retweet_count']
fav_num = ex_stat._json['favorite_count']
# for i in tweepy.Paginator(client.search_all_tweets, query="in_reply_to_status_id: "+str(target_id),
# start_time=start_time, end_time=end_time, max_results=500).flatten(limit=100000):
# for i in client.search_all_tweets(query="in_reply_to_status_id: "+str(target_id), start_time=start_time, end_time=end_time, max_results=200):
# idx += 1
# rep_num = idx
rows = [acc_name, gen_date, cont, in_url, in_media, hash_cont, hash_num, ret_num, fav_num]
tw_data.loc[len(tw_data)] = rows
tw_data
tw_data.to_csv("FinalCheck_sportsmansWH.csv")
column_names = ["Account","gen_date", "text", "in_url", "in_media", "hash_text", "hash_count", "ret_count", "fav_count"]
tw_data = pd.DataFrame(columns = column_names)
tw_data
Account | gen_date | text | in_url | in_media | hash_text | hash_count | ret_count | fav_count |
---|
start_time = "2019-04-14"+"T00:00:01Z"
end_time = "2019-09-28"+"T00:00:01Z"
acc_name = "lululemon"
req_counter = 0
for tweet in tqdm(tweepy.Paginator(client.search_all_tweets, query= "from: "+acc_name+ " lang:en -is:retweet -is:reply -is:quote -is:nullcast",
start_time=start_time, end_time=end_time, max_results=500).flatten(limit=2000)):
req_counter += 1
if req_counter == 900:
time.sleep(901)
req_counter = 0
target_id = tweet.id
ex_stat = api.get_status(target_id)
gen_date = ex_stat._json['created_at']
cont = ex_stat._json['text']
try:
urls = ex_stat.entities['urls'][0]
in_url = urls['url']
except:
in_url = "No URL included"
try:
media = ex_stat.entities['media'][0]
in_media = media['type']
except:
in_media = "No Media included"
hash_cont = ex_stat.entities['hashtags']
hash_num = len(ex_stat.entities['hashtags'])
ret_num = ex_stat._json['retweet_count']
fav_num = ex_stat._json['favorite_count']
# for i in tweepy.Paginator(client.search_all_tweets, query="in_reply_to_status_id: "+str(target_id),
# start_time=start_time, end_time=end_time, max_results=500).flatten(limit=100000):
# for i in client.search_all_tweets(query="in_reply_to_status_id: "+str(target_id), start_time=start_time, end_time=end_time, max_results=200):
# idx += 1
# rep_num = idx
rows = [acc_name, gen_date, cont, in_url, in_media, hash_cont, hash_num, ret_num, fav_num]
tw_data.loc[len(tw_data)] = rows
51it [00:36, 1.40it/s]
tw_data
Account | gen_date | text | in_url | in_media | hash_text | hash_count | ret_count | fav_count | |
---|---|---|---|---|---|---|---|---|---|
0 | lululemon | Wed Aug 28 02:30:40 +0000 2019 | Vetted by @lululemonmen , our best men's worko... | https://t.co/QZ8sqhYMXr | No Media included | [] | 0 | 46 | 47 |
1 | lululemon | Mon Aug 26 23:00:11 +0000 2019 | We’re going beyond the buzzwords and giving yo... | https://t.co/dloPhlz2EY | No Media included | [] | 0 | 5 | 38 |
2 | lululemon | Sat Aug 17 04:10:55 +0000 2019 | Has anyone seen @craig_mcmorris fanny pack? ht... | https://t.co/zeyycJs9ni | photo | [{'text': 'SeaWheeze', 'indices': [68, 78]}] | 1 | 2 | 23 |
3 | lululemon | Fri Aug 16 15:55:00 +0000 2019 | There are 10,000 people running #SeaWheeze. Bu... | https://t.co/saxcECmR2L | No Media included | [{'text': 'SeaWheeze', 'indices': [32, 42]}] | 1 | 1 | 25 |
4 | lululemon | Tue Aug 06 23:30:42 +0000 2019 | Her relationship with her boobs is complicated... | https://t.co/LO3UDF7F9v | No Media included | [] | 0 | 7 | 28 |
5 | lululemon | Tue Aug 06 19:07:31 +0000 2019 | Where there's boobs, there's truths...and duet... | https://t.co/xzFESvHa57 | No Media included | [] | 0 | 5 | 20 |
6 | lululemon | Wed Jul 31 00:08:23 +0000 2019 | Introducing Boob Truth Tuesdays—You’ll laugh, ... | https://t.co/dcCXqjTmrm | No Media included | [] | 0 | 6 | 31 |
7 | lululemon | Tue Jul 23 23:00:02 +0000 2019 | Better, together—our full collection of Men's ... | https://t.co/8JKFYfaeyS | No Media included | [] | 0 | 4 | 47 |
8 | lululemon | Thu Jul 18 00:00:20 +0000 2019 | Sign-up to be the first to know about the new ... | https://t.co/9unfTAajwb | No Media included | [] | 0 | 5 | 45 |
9 | lululemon | Sun Jul 14 00:10:01 +0000 2019 | Need healing? A confidence-boost? Rest? 3 reas... | https://t.co/lkggudeIFi | No Media included | [] | 0 | 10 | 40 |
10 | lululemon | Sat Jul 06 22:37:57 +0000 2019 | Professional quarterback @NickFoles’ secret to... | https://t.co/j6mPl1Wnqn | No Media included | [{'text': 'lululemon', 'indices': [93, 103]}] | 1 | 32 | 295 |
11 | lululemon | Fri Jul 05 19:02:33 +0000 2019 | More time for yoga–ICYMI, Elite Ambassador and... | https://t.co/LlaKEFYaeB | No Media included | [] | 0 | 3 | 39 |
12 | lululemon | Sun Jun 30 20:43:48 +0000 2019 | In honour of 50 years of #pride we’ve asked so... | https://t.co/sIUMuGrBYb | No Media included | [{'text': 'pride', 'indices': [25, 31]}] | 1 | 6 | 70 |
13 | lululemon | Sat Jun 29 22:59:01 +0000 2019 | In any profession, work stress is real. Here a... | https://t.co/Oyutp9qM0i | No Media included | [{'text': 'Chicago', 'indices': [56, 64]}] | 1 | 12 | 87 |
14 | lululemon | Sat Jun 22 02:00:00 +0000 2019 | “Yoga allows me to enjoy the present moment.” ... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 4 | 36 |
15 | lululemon | Sat Jun 22 01:00:12 +0000 2019 | Peace Coleman from I Grow Chicago shares what ... | https://t.co/nfLQG2spHP | No Media included | [] | 0 | 1 | 15 |
16 | lululemon | Sat Jun 22 01:00:00 +0000 2019 | “Yoga is an intimate date with myself.” - Elit... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 9 | 52 |
17 | lululemon | Sat Jun 22 00:30:00 +0000 2019 | “Yoga has taught me to make peace with the unk... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 5 | 46 |
18 | lululemon | Sat Jun 22 00:00:00 +0000 2019 | "Yoga is a daily dose of energy, strength, goo... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 2 | 32 |
19 | lululemon | Fri Jun 21 23:30:00 +0000 2019 | “Yoga puts me completely in control of my mood... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 0 | 25 |
20 | lululemon | Fri Jun 21 23:00:12 +0000 2019 | Adria Moses from @DETBoxingGym took her trauma... | https://t.co/jFcL3YEzEX | No Media included | [] | 0 | 0 | 14 |
21 | lululemon | Fri Jun 21 23:00:00 +0000 2019 | “Yoga helps me to be mindful.” - Elite Ambassa... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 0 | 8 |
22 | lululemon | Fri Jun 21 22:30:00 +0000 2019 | “Yoga turned me from an inflexible jock into a... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 1 | 18 |
23 | lululemon | Fri Jun 21 22:00:00 +0000 2019 | “Yoga helps create more space in my mind and b... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 3 | 23 |
24 | lululemon | Fri Jun 21 21:30:00 +0000 2019 | "Yoga clears my head and heals my body.” - Eli... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 2 | 28 |
25 | lululemon | Fri Jun 21 21:00:00 +0000 2019 | “Yoga is the tool I use to delete my back pain... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 1 | 15 |
26 | lululemon | Fri Jun 21 20:30:00 +0000 2019 | “Yoga exposed me completely.” - Elite Ambassad... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 0 | 19 |
27 | lululemon | Fri Jun 21 20:07:14 +0000 2019 | See what helped @AlexMazerolle discover her pa... | https://t.co/0Zo79KXgZO | No Media included | [] | 0 | 1 | 13 |
28 | lululemon | Fri Jun 21 20:00:00 +0000 2019 | “Yoga has extended my career.” - Elite Ambassa... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 3 | 15 |
29 | lululemon | Fri Jun 21 19:00:00 +0000 2019 | “Yoga has taught me to embrace the moment.” - ... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 2 | 15 |
30 | lululemon | Fri Jun 21 18:30:00 +0000 2019 | “Yoga makes me the most authentic and courageo... | https://t.co/QSSYlBvucy | No Media included | [] | 0 | 2 | 11 |
31 | lululemon | Fri Jun 21 18:00:00 +0000 2019 | “Yoga helped me win medals at the highest leve... | https://t.co/XbVZx7Tduj | No Media included | [] | 0 | 4 | 31 |
32 | lululemon | Fri Jun 21 17:30:00 +0000 2019 | “Yoga is my main method of recovery.” - Elite ... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 0 | 16 |
33 | lululemon | Fri Jun 21 17:00:01 +0000 2019 | “Yoga has taught me to breathe through my chal... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 6 | 18 |
34 | lululemon | Fri Jun 21 16:30:00 +0000 2019 | It's so much more than just poses. How has yog... | No URL included | No Media included | [{'text': 'internationalyogaday', 'indices': [... | 2 | 9 | 33 |
35 | lululemon | Fri Jun 21 16:00:28 +0000 2019 | It’s more powerful than you think: https://t.c... | https://t.co/HHsX6D1roo | photo | [{'text': 'internationalyogaday', 'indices': [... | 2 | 5 | 21 |
36 | lululemon | Tue Jun 18 23:00:12 +0000 2019 | Gahh, we’re so excited! Made with good ingredi... | https://t.co/rZhixCe3eg | No Media included | [] | 0 | 21 | 176 |
37 | lululemon | Tue Jun 18 16:52:12 +0000 2019 | Eau de burpees, sweaty hair, hot yoga face...i... | https://t.co/LoLIne0jN1 | No Media included | [] | 0 | 1 | 61 |
38 | lululemon | Wed Jun 05 21:13:00 +0000 2019 | 323,577 collective kilometers down, how may mo... | https://t.co/mDHcER2NFa | No Media included | [{'text': 'GlobalRunningDay', 'indices': [65, ... | 1 | 3 | 16 |
39 | lululemon | Thu May 30 23:00:06 +0000 2019 | Grab your run crew, hit the pavement and crush... | https://t.co/i7xaAwpvpB | No Media included | [] | 0 | 7 | 38 |
40 | lululemon | Fri May 24 16:01:01 +0000 2019 | On #GlobalRunningDay, let’s show the world the... | https://t.co/cuEbAnMZxP | No Media included | [{'text': 'GlobalRunningDay', 'indices': [3, 2... | 1 | 8 | 46 |
41 | lululemon | Tue May 21 22:18:01 +0000 2019 | Vancouver's own, @robbiedxc is our newest Glob... | https://t.co/xE6zPUg2Jh | No Media included | [] | 0 | 5 | 64 |
42 | lululemon | Sun May 12 12:51:05 +0000 2019 | Happy #MothersDay We’re celebrating our global... | https://t.co/4XxxgbNmCV | No Media included | [{'text': 'MothersDay', 'indices': [6, 17]}] | 1 | 1 | 31 |
43 | lululemon | Sat May 11 18:30:20 +0000 2019 | Recover from your run to keep moving and feeli... | https://t.co/GHNABwSQ3i | No Media included | [] | 0 | 2 | 35 |
44 | lululemon | Sat May 04 18:20:20 +0000 2019 | What sound runners eat and when? Find out in ... | https://t.co/4g4RkkhYSq | No Media included | [] | 0 | 3 | 35 |
45 | lululemon | Fri May 03 22:45:02 +0000 2019 | Congratulations Sun Choe, lululemon’s Chief Pr... | https://t.co/tyQY8tKQ86 | No Media included | [] | 0 | 11 | 73 |
46 | lululemon | Wed May 01 04:18:02 +0000 2019 | Reflecting on growing up, falling down, and fo... | https://t.co/4uixGLRtH1 | No Media included | [] | 0 | 11 | 46 |
47 | lululemon | Sat Apr 27 18:06:20 +0000 2019 | Learn how trail running can help build strengt... | https://t.co/KcjS0pbKm3 | No Media included | [] | 0 | 7 | 32 |
48 | lululemon | Thu Apr 18 00:32:00 +0000 2019 | Running is how he brings people together to ex... | https://t.co/jjsd4scINo | No Media included | [] | 0 | 3 | 23 |
49 | lululemon | Wed Apr 17 20:14:00 +0000 2019 | Our newest Global Run Ambassador opens up and ... | https://t.co/8nE3o8mv3V | No Media included | [] | 0 | 7 | 29 |
50 | lululemon | Sun Apr 14 20:18:19 +0000 2019 | Learn how the track can improve pace, efficien... | https://t.co/yY2xmADYQl | No Media included | [] | 0 | 3 | 27 |
tw_data.to_csv("FinalCheck_lululemon.csv")