Commit 43c80758 authored by Eva Zangerle's avatar Eva Zangerle
Browse files

added examples on data ingestion

parent 0b83b436
# gitignore
# jupyter checkpoints
.ipynb_checkpoints
src/.ipynb_checkpoints
**/*.ipynb_checkpoints/
# IPython
......
Name Postal Code FIPS
Alabama AL 01
Alaska AK 02
Arizona AZ 04
Arkansas AR 05
California CA 06
Colorado CO 08
Connecticut CT 09
Delaware DE 10
Florida FL 12
Georgia GA 13
Hawaii HI 15
Idaho ID 16
Illinois IL 17
Indiana IN 18
Iowa IA 19
Kansas KS 20
Kentucky KY 21
Louisiana LA 22
Maine ME 23
Maryland MD 24
Massachusetts MA 25
Michigan MI 26
Minnesota MN 27
Mississippi MS 28
Missouri MO 29
Montana MT 30
Nebraska NE 31
Nevada NV 32
New Hampshire NH 33
New Jersey NJ 34
New Mexico NM 35
New York NY 36
North Carolina NC 37
North Dakota ND 38
Ohio OH 39
Oklahoma OK 40
Oregon OR 41
Pennsylvania PA 42
Rhode Island RI 44
South Carolina SC 45
South Dakota SD 46
Tennessee TN 47
Texas TX 48
Utah UT 49
Vermont VT 50
Virginia VA 51
Washington WA 53
West Virginia WV 54
Wisconsin WI 55
Wyoming WY 56
American Samoa AS 60
Guam GU 66
Northern Mariana Islands MP 69
Puerto Rico PR 72
Virgin Islands VI 78
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
{"ts":"2020-06-18T10:44:13", "logged_in":{"username":"foo"}, "connection":{"addr":"1.2.3.4","port":5678}}
{"ts":"2020-06-18T10:44:15", "registered":{"username":"bar","email":"bar@example.com"}, "connection":{"addr":"2.3.4.5","port":6789}}
{"ts":"2020-06-18T10:44:16", "logged_out":{"username":"foo"}, "connection":{"addr":"1.2.3.4","port":5678}}
{"ts":"2020-06-18T10:47:22", "registered":{"username":"baz","email":"baz@example.net"}, "connection":{"addr":"3.4.5.6","port":7890}}
Part_No Description Date Price (USD)
12345 Wankle rotary 2020-04-12T15:53:21 555.55
67890 Sousaphone April 12, 2020 333.33
2468 Feather Duster 4/12/2020 22.22
A9922 Area 51 metal 04/12/20 9999.99
Last Name,First Name,4th Grade,5th Grade,6th Grade
Johnson,Mia,A,B+,A-
Lopez,Liam,B,B,A+
Lee,Isabella,C,C-,B-
Fisher,Mason,B,B-,C+
Gupta,Olivia,B,A+,A
Robinson,Sophia, A+,B-,A
\ No newline at end of file
......@@ -5,34 +5,42 @@
"id": "500bd02c-eee8-45e6-a301-3482638767de",
"metadata": {},
"source": [
"# Data Engineering and Analtics\n",
"# Data Engineering and Analytics\n",
"Master Software Engineering\n",
"\n",
"Eva Zangerle\n",
"\n",
"## General Notes\n",
"* Code is partly taken from further sources, such as books.\n",
"* Sources are annotated as follows:\n",
"* Sources are annotated (and acknowledged!) as follows:\n",
" * (CleaningData): Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools; David Mertz; Packt Publishing, 2021; [Github repo](https://github.com/PacktPublishing/Cleaning-Data-for-Effective-Data-Science/)\n",
"* Unless marked otherwise, code was written by Eva Zangerle\n",
"* Unless marked otherwise, code was written by Eva Zangerle.\n",
"\n",
"\n",
"\n",
"## Virtual environments\n",
"\n",
"![xkcd python environment](https://imgs.xkcd.com/comics/python_environment.png)\n",
"\n",
"[Comic taken from XKCD Comics https://xkcd.com/1987/ (CC-BY)]\n",
"\n",
"\n",
"\n",
"Good tutorial on pipenv and jupyter(-lab): https://towardsdatascience.com/virtual-environments-for-data-science-running-python-and-jupyter-with-pipenv-c6cb6c44a405#\n",
"\n",
"\n",
"## Useful python stuff\n",
"* Startup files: https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#startup-files"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0abb7b0a-84c9-4601-9ded-8130f5b38639",
"metadata": {},
"outputs": [],
"source": [
"*"
"* Startup files: https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#startup-files\n",
"* tqdm progress bars (also for Jupyter): https://github.com/tqdm/tqdm\n",
"\n",
"## Further tools\n",
"* jq command linen json processor: https://stedolan.github.io/jq/\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
......
{
"cells": [
{
"cell_type": "markdown",
"id": "fbb49fdc-c9fe-4231-b80f-aa4fd4ba9f7f",
"metadata": {},
"source": [
"# Dataset Creation"
]
},
{
"cell_type": "markdown",
"id": "a6e4d821-cc1b-4b61-ae02-0023ed6b5982",
"metadata": {},
"source": [
"## Short detour: environments\n",
"\n",
"![xkcd python environment](https://imgs.xkcd.com/comics/python_environment.png)\n",
"\n",
"[Comic taken from XKCD Comics https://xkcd.com/1987/ (CC-BY)]"
]
},
{
"cell_type": "markdown",
"id": "0dd1733a-0608-4eb3-ade7-76e9a1aeb424",
"metadata": {},
"source": [
"Dependency managers for python: \n",
"* pipenv\n",
"* conda ?\n"
]
},
{
"cell_type": "markdown",
"id": "521e927a-4016-48d6-adb3-86aa07682356",
"metadata": {},
"source": [
"todo:\n",
"* https://towardsdatascience.com/virtual-environments-for-data-science-running-python-and-jupyter-with-pipenv-c6cb6c44a405"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
This source diff could not be displayed because it is too large. You can view the blob instead.
......@@ -12,12 +12,12 @@
},
{
"cell_type": "code",
"execution_count": 281,
"execution_count": 1,
"id": "792af709-0621-4d64-8166-8c8cc28cc73c",
"metadata": {},
"outputs": [],
"source": [
"# import required libs\n",
"# import required packages\n",
"import psycopg2\n",
"import json\n",
"import math\n",
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment