Start our thinking about projects.
Allow students time to continue working on the Day_08_A_Metro_Diversity assignment.
Participants will work on tangible projects related to the overall theme.
The projects need to be thorough analyses of some open dataset or datasets.
Students can select from a list of projects or they can propose other projects of comparable scope and intent.
Students will have opportunities to brainstorm ideas, choose a specific focal point (drawing from structured feedback from other students and the instructor), craft a proposal for their projects, and then present their work at the end of the course.
Due: Wed, April 2, 2014 11:59pm.
Write a 1-2 page proposal on how your group (of 2-3 team members) will conduct a thorough analyses of some open dataset or datasets. Use the following template for your proposal:
I'd gladly help any of you flesh out any of these ideas:
building a machine readable campus map
build a campus org chart -- a people finder
surfacing images and recontextualizing data in Open Context: Data Publication for Cultural Heritage and Field Research. See Project-Starter_OpenContext.ipynb
Free Law Project: reconciling court cases to Wikipedia and win at FantasySCOTUS from the Harlan Institute.
connect the Hypothes.is annotation API with the IPython notebook to enable annotation of notebooks as well as items computed in the notebook. Throw in http://futurepress.github.io/epub.js/#getting-started and public domain books....
Digital Public Library of America: hack ideas from dp.la Open Committee Call: Technical Advisory, December 4, 2:00 PM Eastern http://dp.la/info/developers/hacking-projects/ and http://j.mp/dpla-hack-ideas
systematically archiving federal government data to archive.org: I tweeted: "Does it make sense to systematically archive US govt data to @internetarchive? Remember how http://census.gov gone during #shutdown" Eric Kansa says yes and feds should pay for it. I have an idea of having as a class project doing some selective archiving to archive.org – but would it be conflicting with the business model of https://archive-it.org/? It's time try out: https://github.com/jjjake/ia-wrapper.
OCLC provides downloadable linked data file for the 1 million most widely held works in WorldCat and Harvard Library Bibliographic Dataset | Open Metadata (12 million bibliographic records)
help keep Amazon Public Data Sets up to date. E.g., can we help to get 2010 census loaded? (to supplement older 2000 data). One thing we can do as a class is to curate and upload data sets to AWS S3. We'll typically want to put a lot of our public data in a common place – Amazon S3 is a great place since we'll likely do a lot of analysis using AWS.
rebuild the Racial Dot Map to enable the mapping of arbitrary 2010 Census data variables.
Working with Wikimedia data of all sorts, using the MediaWiki API, Wikipedia:Database download - Wikipedia, the free encyclopedia and the Tools Lab.
| CommonCrawl data -- we have starter code from last year to build from.
Berkeley Ecoinformatics Engine: An open API serving UC Berkeley's Natural History Data
New California Water Atlas | Making Water Understandable in California
eLife Lens: What can we do with xml source?
government org chart: parse US Government Printing Office - FDsys - Home
real-time BART data: The Real BART API | bart.gov
real-time AC transit data via nextbus API: identify the phantom buses.
weather data: comparing predictions to what actually happened
Flickr Commons photos: over a million public domain photos in Flickr.
and many, many more
Answers published for 2nd Notebook assignment: nbviewer.ipython.org/github/rdhyee/working-open-data-2014/blob/master/notebooks/Day_06_E_Assignment_Answers.ipynb
CodeAcross San Francisco- Eventbrite --> I'll be there this Saturday. Anyone want to join in?