Main Textbook
Wes McKinney. Python for Data Analysis
. (O'Reilly Media, 2012). I strongly recommend getting a paper copy as well as accessing any electronic versions
INFO 290T- Working with Open Data
http://www.ischool.berkeley.edu/courses/290t-wod Spring 2014 / CCN: 41620
T,Th 2:00-3:30pm 202 South Hall
Office Hours: T, Th 3:30-4:30pm, 302 South Hall (along with possible virtual office hours)
Instructor: Raymond Yee, Ph.D.
Contact info:
Twitter: @WorkingOpenData / @rdhyee
Tutor: AJ Renold ()
bcourses site to be unveiled soon...
Open data -- data that is free for use, reuse, and redistribution -- is an intellectual treasure-trove that has given rise to many unexpected and often fruitful applications. In this course, students will
learn how to access, visualize, clean, interpret, and share data, especially open data, using Python, Python-based libraries, and supplementary computational frameworks
understand the theoretical underpinnings of open data and their connections to implementations in the physical and life sciences, government, social sciences, and journalism.
Working with Open Data (WwOD) is a technical course with a strong focus on the social-political context and domains of application of open data.
Info 206 Distributed Computing Applications and Infrastructure or equivalent background with Python.
Grading Scheme:
Subject to Change
Wes McKinney. Python for Data Analysis
. (O'Reilly Media, 2012). I strongly recommend getting a paper copy as well as accessing any electronic versions
Working through IPython Notebooks created by the instructor is the primary vehicle for learning at the beginning of the course.
See A gallery of interesting IPython Notebooks · ipython/ipython Wiki
I plan to supplement the book with materials covering the following topics:
open data, open content in various fields
using JavaScript, HTML5, CSS together with Python for data presentation, analysis, and visualization, (e.g., d3.js)
In addition to survey materials on the public domain, creative commons, and open data movements, I'll focus us on
and other data sets still to be determined, probably large open scientific data sets
A narrative about last year's course co-written by Fernando Perez and Raymond Yee: Exploring Open Data with Pandas and IPython at the Berkeley I School -- includes abstracts of last year's projects.
Working on exercises -> Working on Projects
The US Census + the Wikipedia is an integrating framework
standard Python -> Python in the context of the IPython Notebook -> integrating JavaScript
computing on Wakari -> computing on notebook -> computing on a cluster (and in the cloud)
Last revised: 2014.03.20
It is the student’s responsibility to notify the instructor(s) in writing by the second week of the semester of any potential conflict(s) and to recommend a solution, with the understanding that an earlier deadline or date of examination may be the most practicable solution.
It is the student’s responsibility to inform him/herself about material missed because of an absence, whether or not he/she has been formally excused.
Students that they should not come to class if they become ill.The University has adopted the CDC recommendation that members of the campus community who develop flu-like illness should self-isolate until at least 24 hours after they are free of fever or signs of fever without the use of medication. Let your students know that they should follow this recommendation in deciding whether or not to come to class
In return: there will be flexibility and good judgment in how course requirments will be handled.
Participants will work on tangible projects related to the overall theme.
The projects need to be thorough analyses of some open dataset or datasets.
Students can select from a list of projects or they can propose other projects of comparable scope and intent.
Students will have opportunities to brainstorm ideas, choose a specific focal point (drawing from structured feedback from other students and the instructor), craft a proposal for their projects, and then present their work at the end of the course.
I would like everyone to bring a notebook computer to class so that we can work together in class on programming assignments. If you are not able to do so, check in with me.
see McKinney's narration: http://proquest.safaribooksonline.com/book/programming/python/9781449323592/1dot-preliminaries/id2700570
From http://en.wikipedia.org/w/index.php?title=Special:Cite&page=Open_data&id=532390265:
Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.
A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.
OKCon - Open Knowledge Conference (Sept 2013 in Geneva)
http://okfn.org/wg/ includes:
In development:
OKFN
http://opendatahandbook.org/en/#
and
With any luck, we will not only understand how the map works, we'll also be able to reproduce it and enhance it by the end of the semester. That is how to turn Census 2010 data into a map.
Group activity -- discuss and enter answers at http://bit.ly/wwod1401Q
What you hope to learn and accomplish in the course?
Name 2 to 4 types of data (or datasets) that interest or intrigue you. Bonus: explain why
What's one of the more complicated example of Python programming you've done so far?
What questions do you have for the instructor?
We'll study the population of countries before we dive into the US Census.
For homework and in the next class, we'll focus on getting your own laptop ready for programming.
For today, I want you to sign up for the free account on Wakari.io.
Then, I'll show you how to load up today's notebook into Wakari: https://raw.github.com/rdhyee/working-open-data-2014/master/notebooks/Day_01_A_World_Population.ipynb
(Jan 21) Twitter / sfopendrinks: Lovers of openness! Join us ...
(Feb 1) Wikipedia:Meetup/ArtAndFeminism - Wikipedia, the free encyclopedia
(Apr 3-4, 5-6) Twitter / hypothes_is: Save the date: I Annotate 2014. ...
(July 15-18, 2014 in Berlin)OKFestival 2014
AJ wrote a nice set of notes on how to do so: https://github.com/rdhyee/working-open-data-2014/wiki/IPython-Installation-Options
PfDA
, Chap 3 Python for Data Analysis > 3. IPython: An Interactive Computing and Development EnvironmentPfDA
, Appendix: Python Language Essentials -- to help remind yourself of key elements of standard PythonPfDA
, Chap 2 Introductory Examplesoptional video to give you a sense of the huge possibilties of IPython: SciPy 2013 :: IPython in depth
Reading the news, world news, local news, tech news, understanding new contexts, deepening old interests, controlled serendipity.
The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government."
A primary goal of Data.gov is to improve access to Federal data and expand creative use of those data beyond the walls of government by encouraging innovative ideas (e.g., web applications). Data.gov strives to make government more transparent and is committed to creating an unprecedented level of openness in Government. The openness derived from Data.gov will strengthen our Nation's democracy and promote efficiency and effectiveness in Government.
Data.gov -> Earthquake Feeds - Data.gov -> Real-time Feeds & Notifications -> KML Format -> feed http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/1.0_week_age.kml to Google Maps:
Traditional motivations given for open government data:
My personal interests in the area:
Open data useful testbed for working on data of all sorts, because of zero financial costs and minimal restrictions on use, reuse, redistribution
Growing community around open data because of these low barriers...democratization of data...many more of us can participate in working with open data and attract a wide range of people I love to learn and to think and to understand, a big believer of computational and information systems as mind augmenters/extenders and open data (as open digital access in general) as an important part of building a powerful personal information manager, writ large
Like open source software as an enabler, catalyst, and foundation...you can stand on the shoulder of giants.