Commit b9868d9f authored by Eva Zangerle's avatar Eva Zangerle
Browse files

initial commit

parent b2f18945
This diff is collapsed.
State|Population_2019|Population_2010|House_Seats
California|39512223|37254523|53
Texas|28995881|25145561|36
Florida|21477737|18801310|27
New_York|19453561|+75459
Pennsylvania|12801989|12702379|18
Illinois|12671821|12830632|18
Ohio|11689100|11536504|16
Georgia|10617423|9687653|14
North_Carolina|10488084|9535483|13
Michigan|9986857|9883640|14
New_Jersey|8882190|+90296
Virginia|8535519|8001024|11
Washington|7614893|6724540|10
Arizona|7278717|6392017|9
Massachusetts|6949503|6547629|9
Tennessee|6833174|6346105|9
Indiana|6732219|6483802|9
Missouri|6137428|5988927|8
Maryland|6045680|5773552|8
Wisconsin|5822434|5686986|8
Colorado|5758736|5029196|7
Minnesota|5639632|5303925|8
South_Carolina|5148714|4625364|7
Alabama|4903185|4779736|7
Louisiana|4648794|4533372|6
Kentucky|4467673|4339367|6
Oregon|4217737|3831074|5
Oklahoma|3956971|3751351|5
Connecticut|3565287|3574097|5
Utah|3205958|2763885|4
Iowa|3155070|3046355|4
Puerto_Rico|3193694|3725789|1
Nevada|3080156|2700551|4
Arkansas|3017825|2915918|4
Mississippi|2976149|2967297|4
Kansas|2913314|2853118|4
New_Mexico|2096829|2059179|3
Nebraska|1934408|1826341|3
Idaho|1787065|1567582|2
West_Virginia|1792147|1852994|3
Hawaii|1415872|1360301|2
New_Hampshire|1359711|1316470|2
Maine|1344212|1328361|2
Montana|1068778|989415|1
Rhode_Island|1059361|1052567|2
Delaware|973764|897934|1
South_Dakota|884659|814180|1
North_Dakota|762062|672591|1
Alaska|731545|710231|1
District_of_Columbia|705749|601723|1
Vermont|623989|625741|1
Wyoming|578759|563626|1
Guam|165718|159358|1
U.S._Virgin_Islands|104914|106405|1
American_Samoa|55641|55519|1
Northern_Mariana_Islands|55194|53883|1
{
"cells": [
{
"cell_type": "markdown",
"id": "500bd02c-eee8-45e6-a301-3482638767de",
"metadata": {},
"source": [
"# Data Engineering and Analtics\n",
"Master Software Engineering\n",
"Eva Zangerle\n",
"\n",
"## General Notes\n",
"* Code is partly taken from further sources, such as books.\n",
"* Sources are annotated as follows:\n",
" * (CleaningData): Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools; David Mertz; Packt Publishing, 2021; [Github repo](https://github.com/PacktPublishing/Cleaning-Data-for-Effective-Data-Science/)\n",
"* Unless marked otherwise, code was written by Eva Zangerle\n",
"\n",
"## Useful python stuff\n",
"* Startup files: https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#startup-files"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0abb7b0a-84c9-4601-9ded-8130f5b38639",
"metadata": {},
"outputs": [],
"source": [
"*"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "markdown",
"id": "fbb49fdc-c9fe-4231-b80f-aa4fd4ba9f7f",
"metadata": {},
"source": [
"# Dataset Creation"
]
},
{
"cell_type": "markdown",
"id": "a6e4d821-cc1b-4b61-ae02-0023ed6b5982",
"metadata": {},
"source": [
"## Short detour: environments\n",
"\n",
"![xkcd python environment](https://imgs.xkcd.com/comics/python_environment.png)\n",
"\n",
"[Comic taken from XKCD Comics https://xkcd.com/1987/ (CC-BY)]"
]
},
{
"cell_type": "markdown",
"id": "0dd1733a-0608-4eb3-ade7-76e9a1aeb424",
"metadata": {},
"source": [
"Dependency managers for python: \n",
"* pipenv\n",
"* conda ?\n"
]
},
{
"cell_type": "markdown",
"id": "521e927a-4016-48d6-adb3-86aa07682356",
"metadata": {},
"source": [
"todo:\n",
"* https://towardsdatascience.com/virtual-environments-for-data-science-running-python-and-jupyter-with-pipenv-c6cb6c44a405"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This diff is collapsed.
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "b1bd37ed-b6d0-47dc-bf3b-65df571cb353",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "markdown",
"id": "b5576d1e-7912-4308-ae7d-d6b0571ade38",
"metadata": {},
"source": [
"# Visualization"
]
},
{
"cell_type": "markdown",
"id": "072aa23b-1c58-4b80-a8d4-28db8e1d60fc",
"metadata": {},
"source": [
"todo:\n",
"* plotly and handcuffs (also add to start.py) as follows:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "557c1313-8fb5-4222-9064-961d8f51f744",
"metadata": {},
"outputs": [
{
"ename": "ModuleNotFoundError",
"evalue": "No module named 'plotly'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-1-3bacd8ca2d3f>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Visualization\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mplotly\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplotly\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mpy\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mplotly\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgraph_objs\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mgo\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mplotly\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moffline\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0miplot\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minit_notebook_mode\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0minit_notebook_mode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconnected\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'plotly'"
]
}
],
"source": [
"\n",
"# Visualization\n",
"import plotly.plotly as py\n",
"import plotly.graph_objs as go\n",
"from plotly.offline import iplot, init_notebook_mode\n",
"init_notebook_mode(connected=True)\n",
"import cufflinks as cf\n",
"cf.go_offline(connected=True)\n",
"cf.set_config_file(theme='pearl')\n",
"\n",
"print('Your favorite libraries have been loaded.')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "68da3423-ec0c-4f31-866f-01288666cbe6",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "42cf0d5f-6c8d-4fbd-9f08-37dc11f27fc9",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "be4ee762-b4c6-421b-beec-ec63bd8c13a5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "5698e556-8142-44ef-a8c8-6bff87ac6f93",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "d3d9e2fb-6cc6-4750-855f-21b289f6f70e",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "47db5efe-58a3-4be4-9abd-30f73ac0f610",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment