Commit b9868d9f authored by Eva Zangerle's avatar Eva Zangerle
Browse files

initial commit

parent b2f18945
This source diff could not be displayed because it is too large. You can view the blob instead.
State|Population_2019|Population_2010|House_Seats
California|39512223|37254523|53
Texas|28995881|25145561|36
Florida|21477737|18801310|27
New_York|19453561|+75459
Pennsylvania|12801989|12702379|18
Illinois|12671821|12830632|18
Ohio|11689100|11536504|16
Georgia|10617423|9687653|14
North_Carolina|10488084|9535483|13
Michigan|9986857|9883640|14
New_Jersey|8882190|+90296
Virginia|8535519|8001024|11
Washington|7614893|6724540|10
Arizona|7278717|6392017|9
Massachusetts|6949503|6547629|9
Tennessee|6833174|6346105|9
Indiana|6732219|6483802|9
Missouri|6137428|5988927|8
Maryland|6045680|5773552|8
Wisconsin|5822434|5686986|8
Colorado|5758736|5029196|7
Minnesota|5639632|5303925|8
South_Carolina|5148714|4625364|7
Alabama|4903185|4779736|7
Louisiana|4648794|4533372|6
Kentucky|4467673|4339367|6
Oregon|4217737|3831074|5
Oklahoma|3956971|3751351|5
Connecticut|3565287|3574097|5
Utah|3205958|2763885|4
Iowa|3155070|3046355|4
Puerto_Rico|3193694|3725789|1
Nevada|3080156|2700551|4
Arkansas|3017825|2915918|4
Mississippi|2976149|2967297|4
Kansas|2913314|2853118|4
New_Mexico|2096829|2059179|3
Nebraska|1934408|1826341|3
Idaho|1787065|1567582|2
West_Virginia|1792147|1852994|3
Hawaii|1415872|1360301|2
New_Hampshire|1359711|1316470|2
Maine|1344212|1328361|2
Montana|1068778|989415|1
Rhode_Island|1059361|1052567|2
Delaware|973764|897934|1
South_Dakota|884659|814180|1
North_Dakota|762062|672591|1
Alaska|731545|710231|1
District_of_Columbia|705749|601723|1
Vermont|623989|625741|1
Wyoming|578759|563626|1
Guam|165718|159358|1
U.S._Virgin_Islands|104914|106405|1
American_Samoa|55641|55519|1
Northern_Mariana_Islands|55194|53883|1
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "markdown",
"id": "500bd02c-eee8-45e6-a301-3482638767de",
"metadata": {},
"source": [
"# Data Engineering and Analtics\n",
"Master Software Engineering\n",
"Eva Zangerle\n",
"\n",
"## General Notes\n",
"* Code is partly taken from further sources, such as books.\n",
"* Sources are annotated as follows:\n",
" * (CleaningData): Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools; David Mertz; Packt Publishing, 2021; [Github repo](https://github.com/PacktPublishing/Cleaning-Data-for-Effective-Data-Science/)\n",
"* Unless marked otherwise, code was written by Eva Zangerle\n",
"\n",
"## Useful python stuff\n",
"* Startup files: https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#startup-files"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0abb7b0a-84c9-4601-9ded-8130f5b38639",
"metadata": {},
"outputs": [],
"source": [
"*"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "markdown",
"id": "fbb49fdc-c9fe-4231-b80f-aa4fd4ba9f7f",
"metadata": {},
"source": [
"# Dataset Creation"
]
},
{
"cell_type": "markdown",
"id": "a6e4d821-cc1b-4b61-ae02-0023ed6b5982",
"metadata": {},
"source": [
"## Short detour: environments\n",
"\n",
"![xkcd python environment](https://imgs.xkcd.com/comics/python_environment.png)\n",
"\n",
"[Comic taken from XKCD Comics https://xkcd.com/1987/ (CC-BY)]"
]
},
{
"cell_type": "markdown",
"id": "0dd1733a-0608-4eb3-ade7-76e9a1aeb424",
"metadata": {},
"source": [
"Dependency managers for python: \n",
"* pipenv\n",
"* conda ?\n"
]
},
{
"cell_type": "markdown",
"id": "521e927a-4016-48d6-adb3-86aa07682356",
"metadata": {},
"source": [
"todo:\n",
"* https://towardsdatascience.com/virtual-environments-for-data-science-running-python-and-jupyter-with-pipenv-c6cb6c44a405"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "b1bd37ed-b6d0-47dc-bf3b-65df571cb353",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "markdown",
"id": "b5576d1e-7912-4308-ae7d-d6b0571ade38",
"metadata": {},
"source": [
"# Visualization"
]
},
{
"cell_type": "markdown",
"id": "072aa23b-1c58-4b80-a8d4-28db8e1d60fc",
"metadata": {},
"source": [
"todo:\n",
"* plotly and handcuffs (also add to start.py) as follows:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "557c1313-8fb5-4222-9064-961d8f51f744",
"metadata": {},
"outputs": [
{
"ename": "ModuleNotFoundError",
"evalue": "No module named 'plotly'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m<ipython-input-1-3bacd8ca2d3f>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Visualization\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mplotly\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplotly\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mpy\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mplotly\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgraph_objs\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mgo\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mplotly\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moffline\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0miplot\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minit_notebook_mode\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0minit_notebook_mode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconnected\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'plotly'"
]
}
],
"source": [
"\n",
"# Visualization\n",
"import plotly.plotly as py\n",
"import plotly.graph_objs as go\n",
"from plotly.offline import iplot, init_notebook_mode\n",
"init_notebook_mode(connected=True)\n",
"import cufflinks as cf\n",
"cf.go_offline(connected=True)\n",
"cf.set_config_file(theme='pearl')\n",
"\n",
"print('Your favorite libraries have been loaded.')\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "68da3423-ec0c-4f31-866f-01288666cbe6",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "42cf0d5f-6c8d-4fbd-9f08-37dc11f27fc9",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "be4ee762-b4c6-421b-beec-ec63bd8c13a5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "5698e556-8142-44ef-a8c8-6bff87ac6f93",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "d3d9e2fb-6cc6-4750-855f-21b289f6f70e",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "47db5efe-58a3-4be4-9abd-30f73ac0f610",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment