{ "cells": [ { "cell_type": "markdown", "id": "500bd02c-eee8-45e6-a301-3482638767de", "metadata": {}, "source": [ "# Data Engineering and Analytics\n", "Master Software Engineering\n", "\n", "Eva Zangerle\n", "\n", "## General Notes\n", "* Code is partly taken from further sources, such as books.\n", "* Sources are annotated (and acknowledged!) as follows:\n", " * (CleaningData): Cleaning Data for Effective Data Science: Doing the other 80% of the work with Python, R, and command-line tools; David Mertz; Packt Publishing, 2021; [Github repo](https://github.com/PacktPublishing/Cleaning-Data-for-Effective-Data-Science/)\n", " * (FeatureEng): Feature Engineering for Machine Learning; Alice Zheng and Amanda Casari; O'Reilly, 2018; [Github repo](https://github.com/alicezheng/feature-engineering-book)\n", " * (DSHandbook): Python Data Science Handbook; Jake VanderPlas; O'Reilly, 2016; [Github repo](https://github.com/jakevdp/PythonDataScienceHandbook)\n", " * (PracticalStatistics): Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python; Peter Bruce, Andrew Bruce, and Peter Gedeck; O'Reilly, 2nd edition, 2020; [Github repo](https://github.com/gedeck/practical-statistics-for-data-scientists/)\n", " \n", "* Unless marked otherwise, code was written by Eva Zangerle.\n", "* I deliberately mix different Python packages (e.g., for visualization matplotlib, pandas and seaborn) to showcase their use.\n", "\n", "\n", "\n", "## Virtual environments\n", "\n", "![xkcd python environment](https://imgs.xkcd.com/comics/python_environment.png)\n", "\n", "Comic taken from XKCD Comics https://xkcd.com/1987/ (CC-BY)\n", "\n", "\n", "\n", "Good tutorial on pipenv and jupyter(-lab): https://towardsdatascience.com/virtual-environments-for-data-science-running-python-and-jupyter-with-pipenv-c6cb6c44a405#\n", "\n", "\n", "## Useful python stuff\n", "* Startup files: https://ipython.readthedocs.io/en/stable/interactive/tutorial.html#startup-files\n", "* tqdm progress bars (also for Jupyter): https://github.com/tqdm/tqdm\n", "* nbval for validating Jupyter notebooks: https://github.com/computationalmodelling/nbval\n", "* nbqa for quality assurance for Jupyter notebooks: https://github.com/nbQA-dev/nbQA\n", "\n", "## Further tools\n", "* jq command line json processor: https://stedolan.github.io/jq/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }