Stats Works
  • About This Website

Survivor: Outwit, Outplay, Out...analyze? (Part 1) Collecting The Data

Analyzing the Game of Survivor -- Collecting The Data (1)¶

The Data of Survivor¶

Recently, I have been interested in the CBS reality/game show Survivor. Part of this has been because of the extra time I have had due to the recent circumstances surrounding COVID 19, and some of this is because it is a show that greatly resembles other games that I enjoy playing (social deduction games, etc.) If you have never watched it, I would highly suggest it! I'm not a fan of reality TV, but this show is a great combination of all of the elements of reality TV and a game show that make it enthralling.

After watching a few seasons, being a data scientist, I began to be interested in some of the results of the show. It seems that certain personalities are more or less likely to win, or to be voted out at any time in the game. Additionally, there seems to be a bit of a juggling act between the importance of challenges, strategy and social games.

This is the beginning of a series of posts relating to Survivor, data that can be gathered from it, analysis we can make with some of the sources, and perhaps ending in a few machine learning or statistical learning applications (to try to estimate the season winner, or who will be voted out on which days, etc!)

A quick primer on Survivor¶

To give a brief overview of some of the terms and information I will be using throughout this post, I have written a brief summary here. From wikipedia:

[Survivor] features a group of contestants deliberately marooned in an isolated location, where they must provide > food, water, fire, and shelter for themselves. The contestants compete in challenges for rewards and immunity from elimination. The contestants are progressively eliminated from the game as they are voted out by their fellow-contestants until only one remains to be awarded the grand prize and named the "Sole Survivor".

In Survivor, there are a few things of note. First:

  1. Survivors must survive in the elements with one another as a tribe. Much of the game focuses on them being able to get basic necessities. However, it is hardly a game about survival in the wilderness, despite the name. The majority of the game is a social and challenge-based game.

  2. Every episode, there is one elimination from the game. Contestants vote players off by majority votes.

  3. In the beginning of the season, there are two or more tribes. These groups are separated from one another, and perform in challenges collectively.

  4. Reward challenges provide a chance for a contestant (called castaways by the show) to win luxury prizes, like food, survival gear, or other items of interest. As the game progresses, the rewards usually get more and more enticing.

  5. Immunity challenges provide a strategic advantage in the game. When there are two (or more) tribes, winning an immunity challenge allows the whole tribe to avoid elimination -- someone from the other tribe must leave. When there is one tribe, immunity challenges are individual, which prevents that person from being voted out.

  6. The game changes substantially when the two (or more) tribes are merged into one, usually around halfway through the game. This is known as the merge and from this point on, it is an individual game.

  7. Players tend to work in groups (or alliances) to achieve goals in the game. They can act as a voting block.

  8. The game of Survivor is edited after the whole season is filmed -- meaning the people editing often know who the winner will be. This is important to keep in mind.

  9. Additionally, each episode there are some number of confessionals -- which are segments where contestants speak to the camera away from the other contestants. They may complain, for instance, about a particular contestant, or discuss other parts of the game.

While there is more to the show than just this (and it has evolved quite a bit over time), for our purposes it is enough to understand these basic ideas of the game before jumping in. If you're curious, Season 7 (Pearl Islands) is one of my favorite seasons and is a great place to start!

Detail of: The ETL Process¶

The first step in this process, of course, is getting the actual data to use. To this end, I have actually gone quite far in collecting and creating a workflow to update data as the seasons start. This behind-the-scenes work is a bit more data engineering than data science, but is probably the most important part of any of the analysis I will be doing!

ETL stands for Extract Transform Load which are the three steps to building a data pipeline. This general structure can generally describe any process using data. Most of the time, we spend a lot of time as data scientists considering the Transform part of this equation.

You may not realize it, but any data analysis work can usually be abstracted to fit into these broad three categories. In many of the toy examples, both on this site and elsewhere, the extract portion is limited to loading from an Excel or CSV file, and fitting the model. However, in real life (and in any job as a data scientist!) the method by which we gain access to the data, and which form it is in, is of crucial importance.

Don't think this is part of Data Science -- think again! You may have heard the line that more than 80% of data scientists' time is spent cleaning the data. While the reliability of this result is somewhat questionable (check out this survey for more recent results from 2018) it is without a doubt one of the most time consuming and important parts of a data scientists' toolkit.

Without good, clean, and easily accessible data, all of the analytical muscle and machine learning techniques you may use will be to no avail. As the old adage goes -- garbage in, garbage out.

Without further ado, I want to detail some of the goals of this project:

  1. Find reliable data sources related to as many elements of the game as survivor as I can
  2. Create reproducible code to pull this information moving forward (...Extract..)
  3. Connect different data sources together, generate metrics based on them, and clean the data from these sources (...Transform...)
  4. Load this information into a relational database
  5. Update this information on a daily basis (as new seasons come out, etc.)

To this end, I have written a series of scripts to generalize this process. Then, I use Apache Airflow and a postgres database to periodically run the code and store it in a database. This is all stored on my homeserver at home. The Airflow tasks are run asynchonically on 5 Raspberry Pis. In the future, I may detail this infrastructure work I have done (which is outside the realm of data science, but may be of interest none the less.)

This whole process took me roughly 1 month. It was well worth it, thought -- the data I have collected is rich and from many different sources, including

  1. True Dorks data on statistics on challenges (Excel spreadsheets)
  2. This collection of confessional data from various seasons, with every confessional listed for each episode (Word documents)
  3. The Survivor Wiki for information about episodes, seasons, contestants, tribes and alliances. (HTML, webpages)
  4. Pushshift.io data for the reddit r/survivor subreddit, where people discuss all things related to survivor (API)
  5. Caunce Character Types for each of the different contestants on the show (Excel)

As you can imagine, combining all of these data sources to tell a cohesive story was quite a challenge, but a worthwhile one all the same!

The remainder of this post will be discussing the data that I had collected, how it has been saved, and what I plan to do with it!

Querying the Database¶

I have stored all of the results in a Postgres database, hosted on my homeserver. There is information here that should be kept a secret, like my password, of course, so I will access it using environment varaibles. This was actually saved using the conda environment I have specified for this project, survivor_scraping.

In [6]:
import os
from sqlalchemy import create_engine
import pandas as pd
In [2]:
pg_un, pg_pw, pg_ip, pg_port = [os.getenv(x) for x in ['PG_UN', 'PG_PW', 'PG_IP', 'PG_PORT']]
In [3]:
def pg_uri(un, pw, ip, port):
    return f'postgresql://{un}:{pw}@{ip}:{port}'
In [4]:
eng = create_engine(pg_uri(pg_un, pg_pw, pg_ip, pg_port))

I will now use this engine, as well as the handy read_sql function of pandas, to read in some of the tables and show some of the interesting data we have here!

True Dorks Data¶

For the True Dorks data, we have a few different tables that were extracted from here. The first is the episode_performance_stats table, which details the results from the challenges and other events over the course of the entire episode.

In [12]:
pd.options.display.max_columns = 100
In [13]:
pd.read_sql('SELECT * FROM survivor.episode_performance_stats LIMIT 100', con=eng)
Out[13]:
index challenge_wins challenge_appearances sitout voted_for_bootee votes_against_player total_number_of_votes_in_episode tribal_council_appearances votes_at_council number_of_jury_votes total_number_of_jury_votes number_of_days_spent_in_episode days_in_exile individual_reward_challenge_appearances individual_reward_challenge_wins individual_immunity_challenge_appearances individual_immunity_challenge_wins tribal_reward_challenge_appearances tribal_reward_challenge_wins tribal_immunity_challenge_appearances tribal_immunity_challenge_wins season_id tribal_reward_challenge_second_of_three_place tribal_immunity_challenge_second_of_three_place fire_immunity_challenge tribal_immunity_challenge_third_place contestant_id episode_id created updated
0 5803 0.111111 0.211111 0.0 1 1.0 10.0 1.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 1.0 40.0 None None 1.0 None 740 689.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
1 5804 0.000000 0.111111 0.0 1 0.0 9.0 1.0 1.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 40.0 None None 0.0 None 740 690.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
2 5805 0.000000 0.125000 0.0 0 3.0 8.0 1.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 40.0 None None 0.0 None 740 691.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
3 5806 0.142857 0.142857 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 40.0 None None 0.0 None 740 692.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
4 5807 0.000000 0.200000 0.0 1 0.0 5.0 1.0 1.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 40.0 None None 0.0 None 740 693.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 5898 0.000000 1.000000 0.0 0 0.0 0.0 0.0 0.0 0.0 16.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 40.0 None None 0.0 None 750 702.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
96 5899 0.100000 0.211111 0.0 1 3.0 10.0 1.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0 1.0 40.0 None None 1.0 None 751 689.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
97 5900 0.111111 0.111111 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 40.0 None None 0.0 None 751 690.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
98 5901 0.125000 0.125000 0.0 0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 40.0 None None 0.0 None 751 691.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00
99 5902 0.000000 0.142857 0.0 1 1.0 9.0 1.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 40.0 None None 0.0 None 751 692.0 2020-07-19 00:58:07.828136+00:00 2020-07-19 00:58:07.828136+00:00

100 rows × 30 columns

Here, you can see soem interesting information. Notice that these are based on contestant_ids, episode_ids and season_ids. These are foreign keys for the other three tables, contestant, episode and season. This information was pulled from the wiki.

In order to map these to the correct values, there was a semi-automatic process to find the correct contestant, episode and season based on the names. This is why the Transform portion of the ETL process was necessary -- to make sure all of these tables were consistent. While a lot of work, it has made the data much richer than any of these individual sources alone. From here out, you can assume id columns required matching with other data sources to ensure correctness and to conform with relational database paradigms.

Additionally, for each table you can see a created and updated section. This is present in all of the tables. These were updated most recently on July 19th. If the automated process (using Airflow and the Raspberry Pis) updates these tables, we will see the timestamp at which this was done in these tables. It uses an upsert method, so it should not overwrite the currently used values.

You can see this table details information at an episode level. For instance, if there was a reward challenge and an immunity challenge, this will have the results from both challenges. Some episodes will have multiple of each (especially toward the end) so this is important to take note of. Challenge wins are fractional if they wored as part of a tribe -- for instance, if I am in a tribe of 4 and I win a challenge, I will get .25 challenge wins. Same goes for the challenge appearances.

The remainder of the results here should be relatively self explanatory. More information is listed on the True Dorks site. Thanks, True Dorks!

Additional tables form this source and the immunity_challenge, reward_challenge and vote tables.

In [19]:
pd.read_sql('SELECT * FROM survivor.immunity_challenge LIMIT 100', con=eng)
Out[19]:
index team win_pct total_players_remaining episode_win_pct sitout tc_number season_id contestant_id episode_id win created updated
0 2452 10.0 0.100000 20.0 0.100000 NaN 1.0 0 278 633 1.0 2020-07-18 02:11:20.917866+00:00 2020-07-18 02:11:20.917866+00:00
1 2453 10.0 0.000000 20.0 0.333333 NaN 1.0 0 515 633 0.0 2020-07-18 02:11:20.917866+00:00 2020-07-18 02:11:20.917866+00:00
2 2454 10.0 0.000000 20.0 0.000000 NaN 1.0 0 283 633 0.0 2020-07-18 02:11:20.917866+00:00 2020-07-18 02:11:20.917866+00:00
3 2455 10.0 0.000000 20.0 0.000000 NaN 1.0 0 163 633 0.0 2020-07-18 02:11:20.917866+00:00 2020-07-18 02:11:20.917866+00:00
4 2456 10.0 0.000000 20.0 0.000000 NaN 1.0 0 362 633 0.0 2020-07-18 02:11:20.917866+00:00 2020-07-18 02:11:20.917866+00:00
... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 108 12.0 0.000000 24.0 NaN NaN 1.0 36 674 136 0.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
96 109 12.0 0.000000 24.0 NaN NaN 1.0 36 673 136 0.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
97 110 12.0 0.000000 24.0 NaN NaN 1.0 36 672 136 0.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
98 111 12.0 0.000000 24.0 NaN NaN 1.0 36 671 136 0.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
99 112 12.0 0.083333 24.0 NaN NaN 1.0 36 664 136 1.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

100 rows × 13 columns

In [21]:
pd.read_sql('SELECT * FROM survivor.reward_challenge LIMIT 100', con=eng)
Out[21]:
index total_players_remaining challenge_number tc_number season_id sitout contestant_id episode_id win win_pct team episode_win_pct created updated
0 57 24.0 1.0 1.0 36 None 676 136 1.0 0.083333 12.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 58 24.0 1.0 1.0 36 None 653 136 0.0 0.000000 12.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 59 24.0 1.0 1.0 36 None 654 136 0.0 0.000000 12.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 60 24.0 1.0 1.0 36 None 655 136 0.0 0.000000 12.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 61 24.0 1.0 1.0 36 None 656 136 0.0 0.000000 12.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95 152 20.0 1.0 5.0 36 None 653 144 0.0 0.000000 9.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
96 153 20.0 1.0 5.0 36 None 664 144 0.0 0.000000 9.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
97 154 20.0 1.0 5.0 36 None 667 144 1.0 0.111111 9.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
98 155 20.0 1.0 5.0 36 None 668 144 1.0 0.111111 9.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
99 156 20.0 1.0 5.0 36 None 669 144 1.0 0.111111 9.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

100 rows × 14 columns

The two tables above are pretty self explanatory -- they explain each contestants result in an episode for each particular challenge in that episode.

In [22]:
pd.read_sql('SELECT * FROM survivor.vote LIMIT 100', con=eng)
Out[22]:
index total_players_remaining tc_number season_id contestant_id voted_for_id episode_id vote_number created updated
0 58 18 3 5 212 571.0 490 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 59 18 3 5 6 177.0 490 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 72 16 5 5 575 6.0 491 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 73 16 5 5 541 6.0 491 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 74 16 5 5 43 6.0 491 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
... ... ... ... ... ... ... ... ... ... ...
95 17 20 1 11 628 236.0 562 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
96 18 20 1 11 616 31.0 562 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
97 19 20 1 11 445 31.0 562 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
98 29 19 2 11 285 445.0 563 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
99 30 19 2 11 236 52.0 563 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

100 rows × 10 columns

The above table, also from TrueDorks, lists each vote in ech episode. Note that the contestant and voted_for_id both refer to values in the contestant_season table.

Confessionals Data¶

One of the more interesting data sources here, the confessionals data has the actual words that were said in confessionals by contestants in a select number of seasons. This information could be a very interesting analysis for some NLP work, which I plan to explore in future articles. This was sourced from the google drive here, and was scraped based on the names in the word documents.

In some cases, as you'll notice below, there is not anything mentioned in the content field. In these cases, it was recorded that that particular contestant_id had spoken, but not what they had said. While this is a bit disappointing, the counts per episode or per contestant can surely be an interesting analysis.

In [29]:
pd.read_sql('SELECT * FROM survivor.confessional LIMIT 1', con=eng)
Out[29]:
index content day n_from_player n_in_episode total_confessionals_in_episode contestant_id episode_id season_id created updated
0 5337 None 7 4 16 5 38.0 262 25 2020-07-11 00:44:59.947552+00:00 2020-07-11 00:45:58.492724+00:00
1 5338 None 7 2 17 4 592.0 262 25 2020-07-11 00:44:59.947552+00:00 2020-07-11 00:45:58.492724+00:00
2 5339 None 7 2 18 4 342.0 262 25 2020-07-11 00:44:59.947552+00:00 2020-07-11 00:45:58.492724+00:00
3 6310 Denise was like, “Hey, guys, what are you thin... 34 5 33 6 756.0 701 40 2020-07-11 00:50:29.685756+00:00 2020-07-11 00:50:29.685756+00:00
4 6556 Last night at Tribal Council, I just said to J... 30 1 4 3 744.0 701 40 2020-07-11 00:50:29.685756+00:00 2020-07-11 00:50:29.685756+00:00
... ... ... ... ... ... ... ... ... ... ... ...
95 4663 Yeah, this whole thing is just a game. Scout's... 22 2 11 6 442.0 238 26 2020-07-11 00:44:59.947552+00:00 2020-07-11 00:45:58.492724+00:00
96 4664 I've been fed up with Eliza since Day 2. I'm t... 22 1 12 2 120.0 238 26 2020-07-11 00:44:59.947552+00:00 2020-07-11 00:45:58.492724+00:00
97 5419 None 26 3 15 4 53.0 269 25 2020-07-11 00:44:59.947552+00:00 2020-07-11 00:45:58.492724+00:00
98 4710 Twila is just insecure. She’s scared, and you ... 21 5 29 6 152.0 237 26 2020-07-11 00:44:59.947552+00:00 2020-07-11 00:45:58.492724+00:00
99 5420 None 26 3 16 4 342.0 269 25 2020-07-11 00:44:59.947552+00:00 2020-07-11 00:45:58.492724+00:00

100 rows × 11 columns

Survivor Wikia¶

The information scraped from the Survivor wiki is the following:

  1. Episodes
  2. Seasons
  3. Contestants
  4. Tribes
  5. Alliances
  6. Final Words
  7. Story Quotes
  8. Voting Confessionals

Hopefully, this rich data collected by the people at that wiki will be able to be used to good use! In the process, I was also able to identify small edits which I pushed up to the wikia. Hopefully, I'll be able to find even more!

Seasons, Episodes and Contestants are the building blocks of what makes Survivor survivor. Results pulled from the wiki tended to be of the "more is better" mentality, so data from summary and story sections were included as well.

In [41]:
pd.read_sql('SELECT * FROM survivor.episode LIMIT 10', con=eng)
Out[41]:
index summary story challenges trivia image firstbroadcast viewership wiki_link season_episode_number overall_episode_number overall_slot_rating survivor_rating season_id episode_id episode_name created updated
0 488 She Annoys Me Greatly is the season premiere o... Across the islands of Caramoan in the eastern ... Challenge: Water SlaughterTwo members from eac... * A clip of Jeff Probst filming the "39 days,... https://vignette.wikia.nocookie.net/survivor/i... 2013-02-13 894000000.0 https://survivor.fandom.com/wiki/She_Annoys_Me... 1.0 383.0 7.0 2.4 5.0 488.0 She Annoys Me Greatly 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 489 Honey Badger is the second episode of Survivor... Brandon is upset that Francesca got voted out ... Reward/Immunity Challenge: Plunge, Pull, PopFo... * This episode marks the first time that Malc... https://vignette.wikia.nocookie.net/survivor/i... 2013-02-20 932000000.0 https://survivor.fandom.com/wiki/Honey_Badger 2.0 384.0 7.0 2.4 5.0 489.0 Honey Badger 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 502 * \n\n\tExplore Wikis\n* \n\n\tCommunity Centr... https://vignette.wikia.nocookie.net/survivor/i... 2013-05-12 813000000.0 https://survivor.fandom.com/wiki/Reunion_(Cara... 15.0 397.0 6.0 2.2 5.0 502.0 Reunion (Caramoan) 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 27 None None None * \n\n\tExplore Wikis\n* \n\n\tCommunity Centr... https://vignette.wikia.nocookie.net/survivor/i... 2000-08-23 NaN https://survivor.fandom.com/wiki/Survivor:_The... 14.0 14.0 NaN NaN 6.0 27.0 Question of Trust 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 490 There's Gonna Be Hell to Pay is the third epis... Reward/Immunity Challenge: Cell Block SeaTribe... * This is the second episode where there is a... https://vignette.wikia.nocookie.net/survivor/i... 2013-02-27 917000000.0 https://survivor.fandom.com/wiki/There%27s_Gon... 3.0 385.0 8.0 2.6 5.0 490.0 There%27s Gonna Be Hell to Pay 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 491 Kill or Be Killed is the fourth episode of Sur... Challenge: Head and ShouldersTwo members of ea... * This is the ninth episode of Survivor to fe... https://vignette.wikia.nocookie.net/survivor/i... 2013-03-06 958000000.0 https://survivor.fandom.com/wiki/Kill_or_Be_Ki... 4.0 386.0 7.0 2.6 5.0 491.0 Kill or Be Killed 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 492 Persona Non Grata is the fifth episode of Surv... Tribal Council was held at the location of the... Challenge: Nut BucketTwo tribe members would e... * In an interview with Dalton Ross for Entert... https://vignette.wikia.nocookie.net/survivor/i... 2013-03-13 989000000.0 https://survivor.fandom.com/wiki/Persona_Non_G... 5.0 387.0 8.0 2.7 5.0 492.0 Persona Non Grata 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 500 Don't Say Anything About My Mom is the penulti... Day 35's Tree Mail was a Sprint HTC Evo 4G LTE... Challenge: Dizzy GillespieEach castaway must t... * This episode marks the first time in Surviv... https://vignette.wikia.nocookie.net/survivor/i... 2013-05-08 NaN https://survivor.fandom.com/wiki/Don%27t_Say_A... 13.0 395.0 NaN NaN 5.0 500.0 Don%27t Say Anything About My Mom 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 493 Operation Thunder Dome is the sixth episode of... On Day 14, when the tribes arrived at what loo... Immunity Challenge: Crate OutdoorsEach tribe s... https://vignette.wikia.nocookie.net/survivor/i... 2013-03-20 979000000.0 https://survivor.fandom.com/wiki/Operation_Thu... 6.0 388.0 8.0 2.6 5.0 493.0 Operation Thunder Dome 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 494 Tubby Lunchbox is the seventh episode of Survi... After Tribal Council, Phillip and Corinne stil... Challenge: Hot PursuitThe two tribes would rac... * The Reward Challenge was used in "Dangerous... https://vignette.wikia.nocookie.net/survivor/i... 2013-03-27 943000000.0 https://survivor.fandom.com/wiki/Tubby_Lunchbox 7.0 389.0 7.0 2.5 5.0 494.0 Tubby Lunchbox 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
In [42]:
pd.read_sql('SELECT * FROM survivor.season LIMIT 10', con=eng)
Out[42]:
index season_id days n_episodes history location season_number summary n_survivors trivia twists version viewership name type runnerup_0_id runnerup_1_id winner_id filming_started filming_ended showing_started showing_ended viewership_in_millions created updated
0 0 0 39.0 14.0 The season was filmed in the summer of 2017, o... Mamanuca Islands, Fiji 36.0 The jury vote ended in a tie between two playe... 20 * This is the first season since Survivor: Sa... * Ghost Island:[4] The tribe that wins a Rewa... United States NaN Ghost Island Survivor 0 296.0 278 2017-06-05 2017-07-13 2018-02-28 2018-05-23 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 1 1 39.0 15.0 The season was filmed once again in the vicini... Upolu, Samoa 24.0 Relations between the men's tribe and the wome... 18 * The designs for this season's props, such a... * One World Format: The two competing tribes,... United States 1.163600e+09 One World Survivor 574 406.0 546 2011-08-01 2011-09-08 2012-02-15 2012-05-13 1.163600e+09 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 2 2 39.0 14.0 The season was filmed shortly after filming Su... Palaui Island, Santa Ana, Cagayan, Philippines 28.0 Cagayan is famed among Survivor seasons for it... 18 * The font used for Cagayan's logo is a modif... * Three Tribes: The castaways are divided int... United States NaN Cagayan Survivor 284 NaN 601 2013-07-10 2013-08-17 2014-02-26 2014-05-21 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 3 3 39.0 15.0 ^1 Matthew and Rob did not vote as they could ... Rio Negro, Amazonas, Brazil 6.0 The winner is Jenna Morasca, a 21-year-old swi... 16 * For the first time, the contestants were di... * Tribe Composition: The sixteen castaways we... United States 1.997000e+09 The Amazon Survivor 603 NaN 415 2002-11-04 2002-12-12 2003-02-13 2003-05-11 1.997000e+09 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 4 4 39.0 14.0 Filming for the season was scheduled to start ... Mamanuca Islands, Fiji 33.0 The season used a variant of the tribe divisio... 20 * The font used for this season is a variant ... * Millennials vs. Gen X: The 20 castaways will... United States NaN Millennials vs. Gen X Survivor 516 400.0 331 2016-04-04 2016-05-12 2016-09-21 2016-12-14 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 5 5 39.0 15.0 This season repeats the back-to-back filming s... Caramoan, Camarines Sur, Philippines 26.0 None 20 * The theme of the season is inspired from th... * Fans vs. Favorites: Similar to past season ... United States 1.081500e+09 Caramoan Survivor 405 43.0 123 2012-05-21 2012-06-28 2013-02-13 2013-05-12 1.081500e+09 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 6 6 39.0 14.0 Due to overwhelming interest in the show's for... Pulau Tiga, Sabah, Borneo, Malaysia 1.0 Among Survivor fans, the term "Pagonging" has ... 16 * This is the first season in which both trib... * Tribe Composition: The sixteen castaways we... United States 2.830000e+09 Borneo Survivor 627 NaN 257 2000-03-13 2000-04-20 2000-05-31 2000-08-23 2.830000e+09 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 7 7 39.0 15.0 Casting for this season was originally done as... Koror, Palau 16.0 Survivor: Micronesia, also known as Survivor: ... 20 * This is the first season to have an even-nu... * Fans vs. Favorites: One tribe, Airai​​, is ... United States 1.361000e+09 Micronesia Survivor 325 NaN 513 2007-10-29 2007-12-06 2008-02-07 2008-05-11 1.361000e+09 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 8 8 39.0 15.0 The theme and name of the season were announce... San Juan del Sur, Rivas, Nicaragua 29.0 Survivor: San Juan del Sur, also known as Surv... 18 * Based on the logo and the names of the trib... * Blood vs. Water: Nine pairs of castaways, ea... United States NaN San Juan del Sur Survivor 508 450.0 10 2014-06-01 2014-07-10 2014-09-24 2014-12-17 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 9 9 39.0 15.0 Applications were due June 16, 2006. Mellisa M... Macuata, Vanua Levu, Fiji 14.0 Survivor: Fiji is the fourteenth season of Uni... 19 * Due to Mellisa McNulty dropping out only ho... * Haves vs. Have Nots: The castaways (after th... United States 1.480000e+09 Fiji Survivor 215 265.0 126 2006-10-30 2006-12-07 2007-02-08 2007-05-13 1.480000e+09 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

Contestants were broken into two tables, to separate the idea of contestants in a particular season vs contestants on the whole. Some recurring characters act very differently season to season, so this information may be useful to separate.

In [43]:
pd.read_sql('SELECT * FROM survivor.contestant LIMIT 10', con=eng)
Out[43]:
index wiki_survivor_text wiki_postsurvivor_text trivia birthdate other_profile hometown current_residence occupation_self_reported hobbies pet_peeves three_words claim_to_fame inspiration three_things most_similar_self_reported reason why_survive previous_season first_name last_name nickname twitter sex image wikia contestant_id created updated
0 0 A superfan of the show, AK started the game wi... None None 1987-08-03 Getting to that Final Tribal Council is AK's u... Adelaide, South Australia None Wedding DJState: South AustraliaTribe: Samatau None None None None None None None None None None AK Knight None None M https://vignette.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/AK_Knight 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 2 At the start of the game, Aaron was placed on ... None None 1975-04-25 None Venice, California None Surfing Instructor None None None None None None None None None None Aaron Reisberger None None M http://vignette2.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Aaron_Reisberger 3 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 3 Originally in the majority Sporty Seven allian... None None 1991-01-07 "I was so close last time, purely by being mys... Melbourne, VictoriaPerth, Western Australia None AFL Premiership Winner None None None None None None None None None None Abbey Holmes None None F https://vignette.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Abbey_Holmes 4 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 4 Abi-Maria was part of the yellow Tandang tribe... None None 1979-10-21 Name: Abi-Maria Gomes\nSeason 25\nSurvivor: Ph... Los Angeles, CA[3] None Business Student Languages, hiking, dancing, surfing and skiing. Complainers. Driven, creative and charming. None Steve Jobs' words, "Stay Hungry. Stay foolish." None Parvati – she is as charming as I am. The money! I bring social skills and team work. I am a mo... None Abi-Maria Gomes None None F http://vignette2.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Abi-Maria_Gomes 5 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 5 None None None 1980-09-03 None Naples, Florida None Jewelry Sales & Photographer None None None None None None None None None None Ace Gordon None None M http://vignette2.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Ace_Gordon 6 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 6 Adam started out on the Rarotonga tribe with P... None None 1978-08-21 None San Diego, California None Copier Sales None None None None None None None None None None Adam Gentry None None M http://vignette1.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Adam_Gentry 7 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 24 Amanda formed an early alliance with fellow Fe... None None 1984-08-03 Tribe: Heroes\nCurrent Residence: Los Angeles,... Kalispell, Mont. None Hiking GuideAspiring Designer None None None None "God." None None None None None Amanda Kimmel None None F http://vignette3.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Amanda_Kimmel 25 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 8 None None None None Retrieved from TVNZ.co.nz\nName: Adam\nWith a ... Auckland None Self Employed None None None None None None None None None None Adam O'Brien None None M https://vignette.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Adam_O'Brien 9 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 9 Alan started as a member of the Levu tribe whe... None None 1985-03-29 Current Residence: Houston, Texas\n Detroit, Mich. None NFL Player Golf, scuba diving, and riding my motorcycle. I can't stand liars or people who feel entitle... Intelligent, athletic, and clever. Being a seventh round draft pick and grinding ... My parents. They are my best friends. I respec... Hot sauce because it makes everything better, ... I honestly don't think you will see gameplay l... Playing football has fed my drive to compete p... Everything I've accomplished I've had to scrat... None Alan Ball None None M https://vignette.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Alan_Ball 10 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 10 ^1 In "Double Agent", the vote ended with a 6... None None 1985-03-31 Tribe designation: Upolu\nInspiration in life:... Plantation, Florida Plantation, Fla. Baseball/Dating Coach Poker, writing and exercise. I have little to no patience for ignorance. Pe... Versatile, dynamic and resourceful. I hit my first college home run off of a guy t... None None None None None None Albert Destrade None None M http://vignette2.wikia.nocookie.net/survivor/i... http://survivor.wikia.com/wiki/Albert_Destrade 11 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
In [44]:
pd.read_sql('SELECT * FROM survivor.contestant_season LIMIT 10', con=eng)
Out[44]:
index contestant_season_id occupation location age placement days_lasted votes_against med_evac quit individual_wins season_id attempt_number tribe_0 tribe_1 tribe_2 tribe_3 alliance_0 alliance_1 alliance_2 contestant_id character_id created updated
0 71 71 Aspiring Writer Los Angeles, CA 33 17.0 11.0 6.0 0.0 0.0 0.0 7 1.0 3.0 NaN NaN NaN NaN NaN None 474 9.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 73 73 Social media marketer Cambridge, Massachusetts 29 2.0 39.0 6.0 0.0 0.0 0.0 18 1.0 39.0 45.0 96.0 NaN 46.0 NaN None 49 23.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 74 74 Social media marketer Cambridge, Massachusetts 30 5.0 1.0 0.0 0.0 0.0 0.0 30 2.0 87.0 84.0 87.0 91.0 32.0 NaN None 49 23.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 75 75 Pharmaceutical Sales Saint Louis, MO 26 7.0 30.0 5.0 0.0 0.0 0.0 32 1.0 35.0 158.0 NaN NaN 28.0 57.0 None 660 14.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 76 76 Vacation Club Sales Fort Worth, Texas 18 14.0 20.0 4.0 0.0 0.0 0.0 4 1.0 6.0 144.0 NaN NaN 78.0 NaN None 458 14.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 77 77 Vacation Club Sales Fort Worth, Texas 26 7.0 1.0 0.0 0.0 0.0 0.0 30 2.0 87.0 84.0 87.0 91.0 32.0 NaN None 458 21.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 78 78 Dog trainer Jackson Springs, NC 56 6.0 36.0 11.0 0.0 0.0 3.0 23 1.0 27.0 4.0 86.0 NaN 72.0 52.0 None 277 12.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 79 79 Administrative Assistant Beaver, PA 22 6.0 33.0 6.0 0.0 0.0 0.0 28 1.0 101.0 161.0 NaN NaN 70.0 NaN None 27 16.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 80 80 Administrative Assistant Beaver, PA 25 1.0 39.0 6.0 0.0 0.0 0.0 13 2.0 110.0 24.0 NaN NaN 40.0 NaN None 27 16.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 81 81 Maintenance Supervisor Portland, OR 25 14.0 15.0 6.0 0.0 0.0 1.0 19 1.0 108.0 NaN NaN NaN 65.0 NaN None 222 11.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

Something else that may be interesting to look at is the tribes and the alliances. Who is alligned with whom? How strong are they? This could be interesting to look at historically and for future seasons

In [45]:
pd.read_sql('SELECT * FROM survivor.tribe LIMIT 10', con=eng)
Out[45]:
index tribe_id summary tribal_history trivia name tribenameorigin tribetype dayformed status lowest_placing_member insigniaimage flagimage buffimage image opponent_0 opponent_1 opponent_2 season_id highest_placing_member created updated
0 139 140 Galang is a tribe from Survivor: Blood vs. Wat... \n\n\n * Galang is the only tribe of Returning Playe... Galang Filipino word meaning "respect" Starting Tribe Day 1 Merged with Tadhana on Day 19 68.0 https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... 33.0 NaN None 16.0 18.0 2020-07-11 01:03:00.566347+00:00 2020-07-13 03:35:15.035735+00:00
1 1 2 Jaburu is a tribe from Survivor: The Amazon.\n... None * Jaburu is the first all-female tribe in Sur... Jaburu The Jabiru stork Starting tribe Day 1 Merged with Tambaqui on Day 20 335.0 None https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... 116.0 NaN None 3.0 415.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 13 171 David None None None None NaN None None None None NaN NaN None NaN NaN 2020-07-18 02:59:05.938012+00:00 2020-07-18 02:59:05.938012+00:00
3 3 4 La Flor (known informally as the Younger Tribe... \n\nDuring the marooning on Day 1, the twenty ... * La Flor is the third tribe in Survivor hist... La Flor Spanish word for "the flower" Starting Tribe Day 1 Merged with Espada on Day 19 190.0 https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... 27.0 NaN None 23.0 47.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 137 138 Villains​ is a tribe from Survivor: Heroes vs.... The Villains.From the beginning, almost every ... * Russell Hantz was not known by the Heroes v... Villains "Deception, Manipulation, and Duplicity" Starting Tribe Day 1 Merged with Heroes on Day 25 23.0 https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... NaN NaN None 15.0 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-13 03:35:15.035735+00:00
5 38 39 Chan Loh (ចន្លុះ) is a tribe from Survivor: Ka... \n\nThe "Brains" tribe of the season, Chan Loh... * The Chan Loh beach eventually became Ta Keo... Chan Loh Koh Chanloh, an island off Sihanoukville, Camb... Starting Tribe Day 1 Merged with Gondol on Day 17 389.0 https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... 45.0 NaN None 18.0 73.0 2020-07-11 01:03:00.566347+00:00 2020-07-13 03:35:15.035735+00:00
6 7 8 Tandang is a tribe from Survivor: Philippines.... \n\nThe minute the Tandang tribe reached their... * Yellow is one of the three colors represent... Tandang Filipino word meaning, "rooster" Starting Tribe Day 1 Merged with Kalabaw on Day 17 500.0 https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... 77.0 66.0 None 35.0 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 8 9 Gitanos​ is the merged tribe of Casaya​​ and L... None * Gitanos merged the earliest (Day 16), later... Gitanos Spanish for "gypsies" Merged tribe Day 16 None 548.0 None https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... NaN NaN None 22.0 28.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 10 11 Kalo Kalo is the merged tribe of Mokuta and Va... None None Kalo Kalo Fijian word meaning "star" Merged tribe Day 29 None 825.0 None None None None NaN NaN None 44.0 821.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 126 127 Kasama is the merged tribe of Galang​ and Tadh... * Kasama is the first merged tribe that Monic... Kasama Filipino word meaning "companion" Merged tribe Day 19 None 29.0 None https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... https://vignette.wikia.nocookie.net/survivor/i... NaN NaN None 16.0 18.0 2020-07-11 01:03:00.566347+00:00 2020-07-13 03:35:15.035735+00:00
In [46]:
pd.read_sql('SELECT * FROM survivor.alliance LIMIT 10', con=eng)
Out[46]:
index summary history trivia name dayformed image alliance_id season_id founder_0 founder_1 founder_2 lowest_placing_member highest_placing_0 highest_placing_1 created updated
0 0 None On Day 6, after reading the clue to the Hidden... * Jill is the only member of the alliance to n... Espada Alliance Day 6 https://vignette.wikia.nocookie.net/survivor/i... 1 23.0 459.0 NaN None 35.0 356.0 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 1 None When Russell Hantz was revealed that he would ... * This alliance is the first to successfully v... Zapatera Six Day 7 https://vignette.wikia.nocookie.net/survivor/i... 2 29.0 329.0 NaN None 625.0 112.0 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 2 None After losing their first Immunity Challenge on... * Lindsey Cascaddan is the only member of the... Escameca Alliance Day 11 https://vignette.wikia.nocookie.net/survivor/i... 3 10.0 348.0 NaN None 479.0 279.0 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 7 After Nagarote lost its first challenge on Day... * Will is the only member of the alliance to m... Nagarote Alliance Day 5 https://vignette.wikia.nocookie.net/survivor/i... 8 10.0 12.0 NaN None 209.0 NaN NaN 2020-07-11 01:03:00.566347+00:00 2020-07-13 03:35:14.421798+00:00
4 4 None The Tandang Alliance was originally formed by ... * Every member of the Tandang tribe was a part... Tandang Alliance Day 1 https://vignette.wikia.nocookie.net/survivor/i... 5 35.0 235.0 500.0 None 500.0 623.0 529.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 10 Final Four Alliance None None 112 NaN NaN NaN None NaN NaN NaN 2020-07-13 03:35:14.421798+00:00 2020-07-13 03:35:14.421798+00:00
6 6 None On Day 13, the Zhan Hu tribe received a note f... * This was the first pre-merge 3-person allian... Zhan Hu Alliance Day 13 https://vignette.wikia.nocookie.net/survivor/i... 7 24.0 178.0 NaN None 178.0 349.0 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 40 The alliance of original Matsing tribe members... * This was the first alliance to be created by... Matsing Alliance Day 1 https://vignette.wikia.nocookie.net/survivor/i... 41 35.0 549.0 216.0 None 351.0 549.0 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-13 03:35:14.421798+00:00
8 30 None Although not created until after the eliminati... * Jaison Robinson is the only original member... Foa Foa Four Day 18 https://vignette.wikia.nocookie.net/survivor/i... 31 21.0 251.0 NaN None 494.0 624.0 NaN 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 23 Witches Coven None None 113 NaN NaN NaN None NaN NaN NaN 2020-07-13 03:35:14.421798+00:00 2020-07-13 03:35:14.421798+00:00

Finally, to compliment the confessionals from above, some episodes have a wealth of quotes from contestants in particular situations.

Final words contains the final words spoken by contestants at the end of the episode:

In [47]:
pd.read_sql('SELECT * FROM survivor.final_words LIMIT 10', con=eng)
Out[47]:
index contestant_id content season episode_id created updated
0 0 782 I came in claiming that I had the most knowled... 37 0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 1 113 Well, I think this has been, uh, an awesome ex... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 2 7 I'm a little sad to leave these people because... 6 3 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 3 784 I'm feeling real gutted to be voted out. I don... 37 4 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 4 551 They kicked off their bug-eating hero instead ... 6 5 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 5 242 I think the first two days that I was sick jus... 6 7 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 6 795 My Survivor journey's finally come to an end. ... 37 8 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 7 34 I guess I want to start by just thanking the L... 6 9 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 8 332 I wish I could've made it a little longer. I t... 6 10 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 9 785 I am so devastated to lose and to be going hom... 37 12 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

Voting Confessionals contain words spoken in the voting area each episode (often censored or redacted for dramatic suspense):

In [48]:
pd.read_sql('SELECT * FROM survivor.voting_confessional LIMIT 10', con=eng)
Out[48]:
index voter_id season episode_id type_of_vote initial_or_changed for_or_against content recipient_id created updated
0 0 789 37 0 vote initial against Dee, you're an awesome girl, but, this is a tr... 782.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 1 259 6 1 vote initial against Got to remove the weakest link in the... crew. 113.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 2 318 6 1 vote initial against My vote really kills me because I love her, bu... 113.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 3 257 6 1 vote initial against Stacey. Um, tough call. (sighs) Subtle reasons... 551.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 4 551 6 1 vote initial against He's just... He's an ornery guy... doesn't rea... 58.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 5 58 6 1 vote initial against I'm picking her because I think she's the reas... 113.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 6 34 6 1 vote initial against Just physically it's too much. That's it. That... 113.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 7 113 6 1 vote initial against I'll miss his, uh, skills and, um, his speakin... 58.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 8 780 37 2 vote initial against Really sorry mate. But it's just what I need t... 795.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 9 793 37 2 vote initial against You're awesome, Tony. I'm sorry. 795.0 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

Finally, story quotes contain information from the entire episode. There may be overlap with the confessionals section, or it may be within the tribes (not considered a confessional). Additionally, when there are no contestants, it may represent voice overs or things spoken by people outside of the main contestants.

In [49]:
pd.read_sql('SELECT * FROM survivor.story_quotes LIMIT 10', con=eng)
Out[49]:
index contestant_id content season episode_id created updated
0 0 NaN The sixteen contestants have been separated in... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 1 NaN The Tagi tribe, who will always wear orange, c... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 2 NaN The Pagong tribe, who will always wear yellow,... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 3 NaN Here, it's the impressions you make on the oth... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 4 NaN Throughout their time on the island, tribes wi... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 5 NaN This is Tribal Council, where each week one me... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 6 58.0 Paddling over, uh, we had two or three of thos... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 7 58.0 The hardest part is hanging around with all th... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 8 58.0 Up until, uh, probably last night, I never gav... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 9 627.0 He was yelling at everybody "Let's lose the bo... 6 1 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

NOTE: You may notice that there is a wealth of information here from the wiki itself. While some of this hasn't been cleaned entirely, the text fields so far have been mostly untampered with. What has been cleaned are numeric, or datetime, columns, and relational foreign keys. Until a good use is found for some of the text data, it may not make sense to clean this yet, as I can see there being a lot of information in there that I don't want to slice just yet.

Pushshift.io¶

Pushift.io is a public api for requesting archived Reddit data. The nice thing is that it does not have a small limit to accessing the data, like Reddit's API does, and does not limit on total requests. This data is updated every day, and contains all of the posts since the inception of the r/survivor subreddit in 2011. r/survivor is actually a thriving community, and a rich data source for engagement, as well as predictions for the show. We have two tables here:

Submissions contains data on the submissions as topics to the subreddit.

Comments contains data on the comments in the subreddit.

In [50]:
pd.read_sql('SELECT * FROM survivor.reddit_submissions LIMIT 10', con=eng)
Out[50]:
index author author_flair_css_class author_flair_text created_utc domain full_link id is_self media_embed num_comments over_18 permalink score selftext subreddit subreddit_id thumbnail title url author_created_utc author_fullname edited distinguished media link_flair_css_class link_flair_text gilded mod_reports retrieved_on secure_media_embed stickied user_reports secure_media post_hint preview locked banned_by contest_mode spoiler brand_safe suggested_sort author_cakeday thumbnail_height thumbnail_width is_video approved_at_utc banned_at_utc can_mod_post view_count ... link_flair_template_id author_flair_background_color author_flair_text_color send_replies no_follow subreddit_subscribers is_original_content previous_visits wls pwls media_only author_id is_meta all_awardings allow_live_comments awarders gildings is_robot_indexable total_awards_received treatment_tags upvote_ratio author_patreon_flair author_premium media_metadata author_flair_template_id removed_by_category event_end event_is_live event_start archived can_gild category content_categories hidden quarantine removal_reason subreddit_name_prefixed collections updated_utc steward_reports og_description og_title removed_by poll_data created_dt most_recent_season within_season most_recent_episode created updated
0 16384 MrUnderdawg S31Pregame Spencer 1430250605 self.survivor https://www.reddit.com/r/survivor/comments/346... 346vvp True {} 17.0 False /r/survivor/comments/346vvp/what_would_you_guy... 0.0 I am 14 and love Survivor and have seen most s... survivor t5_2qhu3 self What would you guys think of a teenage version... http://www.reddit.com/r/survivor/comments/346v... 1.376778e+09 t2_csnqs None None None None None 0.0 None 1.440766e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 19:50:05 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 16385 DantheManFoley S31Pregame Savage 1430253710 youtube.com https://www.reddit.com/r/survivor/comments/347... 34739a False {'content': '<iframe class="embedly-embed" ... 6.0 False /r/survivor/comments/34739a/cool_behind_the_sc... 0.0 None survivor t5_2qhu3 http://b.thumbs.redditmedia.com/pzXfFWQiveSigX... Cool behind the scenes Survivor Samoa https://www.youtube.com/watch?v=Z2r4V8TohTQ 1.428826e+09 t2_muy3e None None {'oembed': {'author_name': 'CBS', 'author_url'... None None 0.0 None 1.440766e+09 {'content': '<iframe class="embedly-embed" ... False None {'oembed': {'author_name': 'CBS', 'author_url'... rich:video {'images': [{'id': 'WberhPZBCTrBBVqeOcPSBlPvUw... None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 20:41:50 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 16386 AnghellicKarma S31Pregame Tasha 1430254375 self.survivor https://www.reddit.com/r/survivor/comments/347... 3474so True {} 21.0 False /r/survivor/comments/3474so/spoilers_what_are_... 0.0 Granted, anything can happen, and we know that... survivor t5_2qhu3 self (Spoilers) What are the likely paths to victor... http://www.reddit.com/r/survivor/comments/3474... 1.325176e+09 t2_6je30 None None None None None 0.0 None 1.440766e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 20:52:55 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 16387 LenoxTillman_is_ANTM Player Shirin 1430255552 self.survivor https://www.reddit.com/r/survivor/comments/347... 3477jd True {} 45.0 False /r/survivor/comments/3477jd/spoilers_i_think_i... 17.0 No matter what Sierra is going to be in the mi... survivor t5_2qhu3 self [SPOILERS] I think ____ is screwed http://www.reddit.com/r/survivor/comments/3477... 1.416627e+09 t2_jit7n None None None None None 0.0 None 1.440766e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 21:12:32 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 16388 numberonepassion S31Pregame Varner 1430257442 self.survivor https://www.reddit.com/r/survivor/comments/347... 347bzu True {} 205.0 False /r/survivor/comments/347bzu/least_deserving_pl... 7.0 Any of the players that have returned on any o... survivor t5_2qhu3 self Least deserving player to ever be brought back? http://www.reddit.com/r/survivor/comments/347b... 1.430160e+09 t2_n5ppp None None None None None 0.0 None 1.440766e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 21:44:02 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 16389 [deleted] None None 1430259006 proprofs.com https://www.reddit.com/r/survivor/comments/347... 347fhg False {} 0.0 False /r/survivor/comments/347fhg/puzzlleee/ 1.0 None survivor t5_2qhu3 default puzzlleee http://www.proprofs.com/games/puzzle/sliding/s... NaN None None None None None None 0.0 None 1.440766e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 22:10:06 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 16390 [deleted] None None 1430261862 imgur.com https://www.reddit.com/r/survivor/comments/347... 347ln3 False {} 0.0 False /r/survivor/comments/347ln3/the_closest_i_will... 1.0 None survivor t5_2qhu3 default the closest I will ever get to playing survivor. http://imgur.com/2F3kKWe NaN None None None None None None 0.0 None 1.440766e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 22:57:42 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 16391 lbblur S31Pregame Keith 1430263877 self.survivor https://www.reddit.com/r/survivor/comments/347... 347pqa True {} 0.0 False /r/survivor/comments/347pqa/how_different_woul... 1.0 None survivor t5_2qhu3 default How different would Survivor South Pacific hav... http://www.reddit.com/r/survivor/comments/347p... NaN None None None None None None 0.0 None 1.440766e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 23:31:17 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 16392 numberonepassion S31Pregame Varner 1430264587 self.survivor https://www.reddit.com/r/survivor/comments/347... 347r5o True {} 57.0 False /r/survivor/comments/347r5o/predictions_for_th... 6.0 What do you think will happen? What irrelevant... survivor t5_2qhu3 self Predictions for the reunion show? http://www.reddit.com/r/survivor/comments/347r... 1.430160e+09 t2_n5ppp None None None None None 0.0 None 1.440766e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-04-28 23:43:07 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 16845 [deleted] None None 1430868753 mobile.twitter.com https://www.reddit.com/r/survivor/comments/34z... 34zzhu False {} 3.0 False /r/survivor/comments/34zzhu/survivors_got_a_bi... 1.0 None survivor t5_2qhu3 default Survivor's Got a Big Announcement Tomorrow Eve... https://mobile.twitter.com/Survivor_Tweet/stat... NaN None None None None None None 0.0 None 1.440753e+09 {} False None None None None None None None None None None None None None None None None None None ... None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-05-05 23:32:33 None None None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

10 rows × 116 columns

In [51]:
pd.read_sql('SELECT * FROM survivor.reddit_comments LIMIT 10', con=eng)
Out[51]:
index author author_created_utc author_flair_css_class author_flair_text author_fullname body controversiality created_utc distinguished gilded id link_id nest_level parent_id reply_delay retrieved_on score score_hidden subreddit subreddit_id edited user_removed mod_removed stickied author_cakeday can_gild collapsed collapsed_reason is_submitter gildings permalink permalink_url updated_utc subreddit_type no_follow send_replies author_flair_template_id author_flair_background_color author_flair_richtext author_flair_text_color author_flair_type rte_mode subreddit_name_prefixed all_awardings associated_award author_patreon_flair author_premium awarders collapsed_because_crowd_control locked total_awards_received treatment_tags steward_reports top_awarded_type created_dt most_recent_season within_season most_recent_episode submission_id created updated
0 647761 sseidl88 1.383535e+09 S31Pregame Spencer t2_drjpb She bout to lose it next episode 0.0 1443106261 None 0.0 cvclc5e t3_3m4uta 2.0 t1_cvc187u 50932.0 1.444555e+09 3.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:51:01 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 647762 jsreid 1.283268e+09 S31Pregame Fishbach t2_4arr8 Episode 1 spoiler or worse? 0.0 1443106288 None 0.0 cvclcso t3_3m58ai 5.0 t1_cvchlyq 6819.0 1.444555e+09 2.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:51:28 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 647763 SawRub 1.321492e+09 S31Pregame Spencer t2_69agp Maybe Spencer sees this as the good three-Brai... 0.0 1443106290 None 0.0 cvclcus t3_3m4uta 3.0 t1_cvc70r9 40938.0 1.444555e+09 5.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:51:30 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 647764 sseidl88 1.383535e+09 S31Pregame Spencer t2_drjpb He was trying so hard not to laugh 0.0 1443106349 None 0.0 cvcle88 t3_3m4uta 2.0 t1_cvc1r7c 50089.0 1.444555e+09 3.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:52:29 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 647765 InquisitorialSquad 1.406635e+09 S31Pregame Fishbach t2_hmisk I've been off this sub-reddit in a while... is... 0.0 1443106370 None 0.0 cvclepd t3_3m576l 3.0 t1_cvc38a0 47585.0 1.444555e+09 1.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:52:50 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 647766 [deleted] NaN None None None [deleted] 0.0 1443106372 None 0.0 cvcler6 t3_3m4uta 4.0 t1_cvc7u7g 39388.0 1.444555e+09 1.0 None survivor t5_2qhu3 None True None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:52:52 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 647767 lex_machine 1.402710e+09 S31Pregame Varner t2_gz19g He's got the full schemer edit:\n\n- Aras is t... 0.0 1443106406 None 0.0 cvclfk9 t3_3m6nsz 2.0 t1_cvckx52 796.0 1.444555e+09 1.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:53:26 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 647768 zallirog23 1.387086e+09 S31Pregame Wiglesworth t2_eczt1 She was talking to people, no one mentioned he... 0.0 1443106433 None 0.0 cvclg80 t3_3m70bk 2.0 t1_cvchrgp 6619.0 1.444555e+09 15.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:53:53 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 647769 zallirog23 1.387086e+09 S31Pregame Wiglesworth t2_eczt1 Got her bracelet back, it's a victory. 0.0 1443106467 None 0.0 cvclgzw t3_3m70bk 2.0 t1_cvchrgf 6653.0 1.444555e+09 25.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:54:27 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 647770 SkyborneScout NaN S31Pregame Monica None Didn't Kelly Goldsmith vote out Diane in Episo... 0.0 1443106484 None 0.0 cvclhek t3_3m59f8 3.0 t1_cvc6hs1 42115.0 1.444555e+09 11.0 None survivor t5_2qhu3 None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None 2015-09-24 14:54:44 11.0 11.0 562.0 None 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

There is a wealth of information that Pushift.io provides here. In terms of connecting to the other tables, we have a most_recent_season, within_season and most_recent_episode column which relates the date of the post (or comment) to the most recent season/episode in our tables. This will be the subject of our first analysis!

Caunce Character Types¶

Last, and certainly not least, are the Caunce Character types. These are character types assigned by Angie Caunce a reporter on the show on a popular podcast, Rob has a Podcast. I actually don't know too much about this, but it is talked about quite a bit on the subreddit and other forums. Each contestant is linked to the character type to which Angie has assinged them. These have ids, like the other tables, and look like this:

In [53]:
pd.read_sql('SELECT * FROM survivor.role LIMIT 10', con=eng)
Out[53]:
index role_id role description created updated
0 0 0 Good Ol' Boy Southern accent, country roots, often a farmer... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
1 1 1 Know It All Superfan, highly intelligent, understands stra... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
2 2 2 Seduce and Destroy Young professional (marketing/sales), “ladies ... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
3 3 3 The Social Butterfly Sometimes gay, super social, witty, extremely ... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
4 4 4 Alpha Male Control Freak CEO or doctor, rich and powerful, bossy, contr... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
5 5 5 The Specialist Eccentric, little crazy, full of himself, huge... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
6 6 6 True Grit Retired pro athlete or military guy, cop, fire... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
7 7 7 John McClane 25-35 regular Joe (blue collar job), aggressiv... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
8 8 8 Surfer Dude Long hair, easy going, very athletic, new agey... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00
9 9 9 Mr. Miagi Kind, wise, intelligent, well spoken, not inte... 2020-07-11 01:03:00.566347+00:00 2020-07-11 01:03:00.566347+00:00

Okay, cool. So What?¶

Totally understand you asking yourself this question, if you've made it this far. So what? Who cares about all of this data?

Well, I wanted to make a post about this since, for the next few months (and possibly beyond...) I will be posting an analysis weekly on Survivor based on these tables. And since it can certainly be confusing, I figured the best place to start is where all analysis starts -- data collection and cleaning. And while the next few analyses may be the most fun to dig into, they will by far be much less time consuming, and ultimately much less important, than this step.

One of the coolest parts about all of this is that the generation of these tables is now put into production in my little mini homeserver environment. Now, my Raspberry Pis will be put to great use scraping and otherwise collecting data from all of these sources every day. Even better, I don't even have to think about it anymore!

As a data scientist, this is a beautiful situation -- I am able to be entirely self sufficient to do all of the cool data science stuff that I want! :)

Also, I'm a huge nerd and hope you are too and hope that seeing all of this data makes you drool! If you would like for me to share this data with you, please feel free to reach out to me!

Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.
Comments
comments powered by Disqus

Articles in Survivor

  • Survivor: Outwit, Outplay, Out...analyze? (Part 2) Looking at Reddit Mentions »

All Articles

  • « Hidden Markov Models Extensions
  • Survivor: Outwit, Outplay, Out...analyze? (Part 2) Looking at Reddit Mentions »

Published

Oct 17, 2021

Category

Survivor

Tags

  • etl 1
  • project 4
  • survivor 4

Contact

  • Stats Works - Making Statistics Work for You