name: inverse layout: true class: center, middle, inverse --- #A spatial analysis of non-English Twitter activity in Houston, TX Matthew Haffner Department of Geography Oklahoma State University https://mhaffner.github.io .footnote[created with remark.js] --- layout: false .left-column[ ## Introduction ### - VGI/CGI ] .right-column[ - Volunteered (Goodchild 2007) or contributed (Harvey 2013) geographic information (VGI/CGI) can supplement (or replace) conventional data sources (See et al. 2016) - Cities in particular are becoming more reliant on big data (Kitchin 2013) - CGI is becoming more representative of the general population (Greenwood, Perrin, and Duggan 2016; Zickuhr 2013) ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ] .right-column[ - User-contributed sources of geographic information suffer from a number of shortcomings - A wealth of research has been conducted on VGI/CGI _accuracy_, but much less has examined its _validity_ (specifically location-based social media; LBSM) in studying spatial processes ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ] .right-column[ 1. Can conventional variables explain the production of Twitter activity by non-English users? 2. How does LBSM inform us about users' behaviors? 3. Is LBSM valid for studying spatial processes? ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ] .right-column[ - Counts of _users_ rather than _tweets_ - _Account_ language rather than _tweet_ language ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ] .right-column[ - Counts of _users_ rather than _tweets_ - _Account_ language rather than _tweet_ language ```json "_source": { "created_at": "Sat Sep 03 11:57:06 +0000 2016", "text": "So happy to spend my senior year by your side 💓 @ Green Lake, Wisconsin https://t.co/BLWIObnrxD", "user": { "location": "WI, USA", "description": "livin' life, like I lived twice", "followers_count": 786, "friends_count": 638, "favourites_count": 4758, "statuses_count": 9052, "time_zone": "Madrid", * "lang": "es" }, "geo": { "type": "Point", "coordinates": [ 43.84277778, -88.95777778 ] }, * "lang": "en" } ``` ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ] .right-column[ - _Precise_ location as opposed to _general_ ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ] .right-column[ - _Precise_ location as opposed to _general_ ```json "geo": { "type": "Point", * "coordinates": [ * 43.84277778, * -88.95777778 ] }, "place": { "full_name": "Wisconsin, USA", "bounding_box": { "type": "Polygon", * "coordinates": [ * [ * [ * -92.889433, * 42.491921 * ], * [ * -92.889433, * 47.309715 * ], * [ * -86.24955, * 47.309715 * ], * [ * -86.24955, * 42.491921 ] ] ] } } ``` ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ## Data & methods ] .right-column[ - Counts of users with an account language other than English (i.e. non-English Twitter users) who produced a geotagged tweet within census tracts in Harris County, Texas (Houston area) from October 2015 - November 2016. - Geographically weighted regression (GWR) ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ## Data & methods ] .right-column[ .center[Variables]
] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ## Data & methods ## Results ] .right-column[ .center[GWR Results
]] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ## Data & methods ## Results ] .right-column[
.center[Percent White GWR Results
]] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ## Data & methods ## Results ] .right-column[
.center[Outliers of Non-English Twitter Users
]] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ## Data & methods ## Results ## Conclusion ] .right-column[ ### Overarching thoughts/words of caution - Independent variables vary wildly within the top 8 NETU tracts - The effect of the number of employees (JOBS) is much stronger than the effect of residential population (POP) - Information on place versus users ] --- .left-column[ ## Introduction ### - VGI/CGI ### - Data quality ## Research focus ### - Questions ### - Approach ## Data & methods ## Results ## Conclusion ] .right-column[ ### Overarching thoughts/words of caution - Independent variables vary wildly within the top 8 NETU tracts - The effect of the number of employees (JOBS) is much stronger than the effect of residential population (POP) - Information on place versus users ### Positive findings - Differences between data sources means something... right? - Appearance of some unexpected languages ] --- template: inverse Haffner, M. 2018. A spatial analysis of non-English Twitter activity in Houston, TX. Accepted for publication in _Transactions in GIS_. https://mhaffner.github.io/presentations/non-english-tweets.html
--- # References - Goodchild, M. F. (2007). Citizens as sensors: The world of volunteered geography. _GeoJournal_, 69(4), 211–221. - Kitchin, R. (2013). The real-time city? Big data and smart urbanism. _GeoJournal_, 79(1), 1–14. - Harvey, F. (2013). To volunteer or contribute locational information? Towards truth in labeling crowdsourced geographic information. In D. Sui, S. Elwood, & M. F. Goodchild (Eds.), _Crowdsourcing geographic knowledge_ (pp. 31–42). Dordrecht, Netherlands: Springer. - Kitchin, R. (2013). The real-time city? Big data and smart urbanism. _GeoJournal_, 79(1), 1–14. - Zickuhr, K. (2013). Location-based services. Washington, DC: Pew Research Center.