01_introduction.ipynb 2.89 KB
Newer Older
Eva Zangerle's avatar
Eva Zangerle committed
1
2
3
4
5
6
7
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "500bd02c-eee8-45e6-a301-3482638767de",
   "metadata": {},
   "source": [
8
    "# Data Engineering and Analytics\n",
Eva Zangerle's avatar
Eva Zangerle committed
9
    "Master Software Engineering\n",
10
    "\n",
Eva Zangerle's avatar
Eva Zangerle committed
11
12
13
14
    "Eva Zangerle\n",
    "\n",
    "## General Notes\n",
    "* Code is partly taken from further sources, such as books.\n",
15
    "* Sources are annotated (and acknowledged!) as follows:\n",
Eva Zangerle's avatar
Eva Zangerle committed
16
    "    * (CleaningData): Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools; David Mertz; Packt Publishing, 2021; [Github repo](https://github.com/PacktPublishing/Cleaning-Data-for-Effective-Data-Science/)\n",
Eva Zangerle's avatar
Eva Zangerle committed
17
18
    "    * (FeatureEng): Feature Engineering for Machine Learning; Alice Zheng and Amanda Casari; O'Reilly, 2018; [Github repo](https://github.com/alicezheng/feature-engineering-book)\n",
    "    * (DSHandbook): Python Data Science Handbook; Jake VanderPlas; O'Reilly, 2016; [Github repo](https://github.com/jakevdp/PythonDataScienceHandbook)\n",
Eva Zangerle's avatar
Eva Zangerle committed
19
20
    "    * (PracticalStatistics): Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python; Peter Bruce, Andrew Bruce, and Peter Gedeck; O'Reilly, 2nd edition, 2020; [Github repo](https://github.com/gedeck/practical-statistics-for-data-scientists/)\n",
    "    \n",
21
    "* Unless marked otherwise, code was written by Eva Zangerle.\n",
Eva Zangerle's avatar
Eva Zangerle committed
22
    "* I deliberately mix different Python packages (e.g., for visualization matplotlib, pandas and seaborn) to showcase their use.\n",
23
24
25
26
27
28
29
    "\n",
    "\n",
    "\n",
    "## Virtual environments\n",
    "\n",
    "![xkcd python environment](https://imgs.xkcd.com/comics/python_environment.png)\n",
    "\n",
Eva Zangerle's avatar
Eva Zangerle committed
30
    "Comic taken from XKCD Comics https://xkcd.com/1987/ (CC-BY)\n",
31
32
33
34
35
    "\n",
    "\n",
    "\n",
    "Good tutorial on pipenv and jupyter(-lab): https://towardsdatascience.com/virtual-environments-for-data-science-running-python-and-jupyter-with-pipenv-c6cb6c44a405#\n",
    "\n",
Eva Zangerle's avatar
Eva Zangerle committed
36
37
    "\n",
    "## Useful python stuff\n",
38
39
    "* Startup files: https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#startup-files\n",
    "* tqdm progress bars (also for Jupyter): https://github.com/tqdm/tqdm\n",
Eva Zangerle's avatar
Eva Zangerle committed
40
41
    "* nbval for validating Jupyter notebooks: https://github.com/computationalmodelling/nbval\n",
    "* nbqa for quality assurance for Jupyter notebooks: https://github.com/nbQA-dev/nbQA\n",
42
43
    "\n",
    "## Further tools\n",
Eva Zangerle's avatar
Eva Zangerle committed
44
    "* jq command line json processor: https://stedolan.github.io/jq/"
Eva Zangerle's avatar
Eva Zangerle committed
45
   ]
Eva Zangerle's avatar
Eva Zangerle committed
46
47
48
49
  }
 ],
 "metadata": {
  "kernelspec": {
50
   "display_name": "Python 3 (ipykernel)",
Eva Zangerle's avatar
Eva Zangerle committed
51
52
53
54
55
56
57
58
59
60
61
62
63
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
Eva Zangerle's avatar
Eva Zangerle committed
64
   "version": "3.9.7"
Eva Zangerle's avatar
Eva Zangerle committed
65
66
67
68
69
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}