Dealing with le systems COMMAN D LIN E AUTOMATION IN P YTH ON - - PowerPoint PPT Presentation

dealing with le systems
SMART_READER_LITE
LIVE PREVIEW

Dealing with le systems COMMAN D LIN E AUTOMATION IN P YTH ON - - PowerPoint PPT Presentation

Dealing with le systems COMMAN D LIN E AUTOMATION IN P YTH ON Noah Gift Lecturer, Northwestern & UC Davis & UC Berkeley | Founder, Pragmatic AI Labs Computer User log les build artifacts directory trees structured data


slide-1
SLIDE 1

Dealing with le systems

COMMAN D LIN E AUTOMATION IN P YTH ON

Noah Gift

Lecturer, Northwestern & UC Davis & UC Berkeley | Founder, Pragmatic AI Labs

slide-2
SLIDE 2

COMMAND LINE AUTOMATION IN PYTHON

Computer User

log les build artifacts directory trees structured data unstructured data ML models

slide-3
SLIDE 3

COMMAND LINE AUTOMATION IN PYTHON

Filesystem

File system is a hierarchy The Unix tree command ??? Makefile ??? README.md ??? demos ? ??? flask-sklearn ? ? ??? Dockerfile ? ? ??? Makefile ? ? ??? README.md ? ? ??? app.py ? ? ??? ml_prediction.joblib

slide-4
SLIDE 4

COMMAND LINE AUTOMATION IN PYTHON

Human User

cong les user prole data business documents code data science projects ML models

slide-5
SLIDE 5

COMMAND LINE AUTOMATION IN PYTHON

Leaning into os.walk

  • s.walk returns:

root dirs files

Returns a generator # generator only returns a result at a time foo = os.walk("/tmp") type(foo)

generator

slide-6
SLIDE 6

COMMAND LINE AUTOMATION IN PYTHON

Finding le extensions

splitting off a le extension

fullpath = "/tmp/somestuff/data.csv" _, ext = os.path.splitext(fullpath) '.csv'

slide-7
SLIDE 7

Let's practice.

COMMAN D LIN E AUTOMATION IN P YTH ON

slide-8
SLIDE 8

Find les matching a pattern

COMMAN D LIN E AUTOMATION IN P YTH ON

Noah Gift

Lecturer, Northwestern & UC Davis & UC Berkeley | Founder, Pragmatic AI Labs

slide-9
SLIDE 9

COMMAND LINE AUTOMATION IN PYTHON

Using Path.glob()

Path.glob()

nds patterns in directories yields matches can recursively search

slide-10
SLIDE 10

COMMAND LINE AUTOMATION IN PYTHON

Simple glob patterns

from pathlib import Path path = Path("data") list(path.glob("*.csv")) [PosixPath('mydata.csv'), PosixPath('yourdata.csv')]

slide-11
SLIDE 11

COMMAND LINE AUTOMATION IN PYTHON

Recursive glob patterns

from pathlib import Path path = Path("data") list(path.glob("**/*.csv")) [PosixPath('data/one.csv'), PosixPath('data/moredata/two.csv')]

slide-12
SLIDE 12

COMMAND LINE AUTOMATION IN PYTHON

Using os.walk to nd patterns

  • s.walk pattern matching

more explicit can explicitly look at directories or les doesn't return Path object

import os result = os.walk("/tmp") # consume the generator next(result) # Find your pattern here....

slide-13
SLIDE 13

COMMAND LINE AUTOMATION IN PYTHON

Using fnmatch

Supports Unix shell wildcard matches Can be converted to regular expression if fnmatch.fnmatch(file, "*.csv"): log.info(f"Found match {file}")

slide-14
SLIDE 14

COMMAND LINE AUTOMATION IN PYTHON

Converting fnmatch to regular expression

fnmatch.translate converts pattern to regex import fnmatch, re regex = fnmatch.translate('*.csv') pattern = re.compile(regex) print(pattern)

re.compile(r'(?s:.*\.csv)\Z', re.UNICODE) pattern.match("titanic.csv") <re.Match object; span=(0, 11), match='titanic.csv'>

slide-15
SLIDE 15

Let's practice!

COMMAN D LIN E AUTOMATION IN P YTH ON

slide-16
SLIDE 16

High-level le and directory operations

COMMAN D LIN E AUTOMATION IN P YTH ON

Noah Gift

Lecturer, Northwestern & UC Davis & UC Berkeley | Founder, Pragmatic AI Labs

slide-17
SLIDE 17

COMMAND LINE AUTOMATION IN PYTHON

Two powerful modules

shutil : high-level le operations

copy tree delete tree archive tree

tempfile : generates temporary les and directories

slide-18
SLIDE 18

COMMAND LINE AUTOMATION IN PYTHON

Using shutil.copytree

Can recursively copy a tree of les and folders from shutil import copytree, ignore_patterns Can ignore patterns copytree(source, destination, ignore=ignore_patterns('*.txt', '*.excel'))

slide-19
SLIDE 19

COMMAND LINE AUTOMATION IN PYTHON

copytree in action

In [1]: pwd Out[1]: '/private/tmp' In [2]: !mkdir sometree && touch sometree/somefile.txt In [3]: from shutil import copytree In [5]: copytree("sometree", "newtree") Out[5]: 'newtree' In [6]: !ls -l newtree/ total 0

  • rw-r--r-- 1 noahgift wheel 0 May 19 20:08 somefile.txt
slide-20
SLIDE 20

COMMAND LINE AUTOMATION IN PYTHON

Using shutil.rmtree

Can recursively delete tree of les and folders

from shutil import rmtree rmtree(source, destination)

slide-21
SLIDE 21

COMMAND LINE AUTOMATION IN PYTHON

Using shutil.make_archive

Archiving a tree with make_archive

from shutil import make_archive make_archive("somearchive", "gztar", "inside_tmp_dir") '/tmp/somearchive.tar.gz'

slide-22
SLIDE 22

COMMAND LINE AUTOMATION IN PYTHON

Automation Takeaways

Use the Python standard library If an automation tasks requires a lot of code The approach may be incorrect Consult the Python standard library Look at 3rd party Python libraries The less code you write, the less bugs you have

slide-23
SLIDE 23

Practicing high-level automation

COMMAN D LIN E AUTOMATION IN P YTH ON

slide-24
SLIDE 24

Using pathlib

COMMAN D LIN E AUTOMATION IN P YTH ON

Noah Gift

Lecturer, Northwestern & UC Davis & UC Berkeley | Founder, Pragmatic AI Labs

slide-25
SLIDE 25

COMMAND LINE AUTOMATION IN PYTHON

Using pathlib.Path

from pathlib import Path

Make a path object path = Path("/usr/bin") List items in directory as object list(path.glob("*"))[0:4]

[PosixPath('/usr/bin/link'), PosixPath('/usr/bin/tput'),

slide-26
SLIDE 26

COMMAND LINE AUTOMATION IN PYTHON

Working with PosixPath objects

mypath.cwd() PosixPath('/app') mypath.exists() True

slide-27
SLIDE 27

COMMAND LINE AUTOMATION IN PYTHON

More PosixPath

mypath.as_posix() '/usr/bin/link'

slide-28
SLIDE 28

COMMAND LINE AUTOMATION IN PYTHON

Open a le with pathlib

Open a Makefile from a path object from pathlib import Path some_file = Path("Makefile") Print the last line of the Makefile

with some_file.open() as file_to_read: print(file_to_read.readlines()[-1:]) ['all: install lint test\n']

slide-29
SLIDE 29

COMMAND LINE AUTOMATION IN PYTHON

Create a directory with pathlib

Path objects can create directories from pathlib import Path tmp = Path("/tmp/inside_tmp_dir") tmp.mkdir() Contents of the directory ls -l /tmp/

inside_tmp_dir/

slide-30
SLIDE 30

COMMAND LINE AUTOMATION IN PYTHON

Write text with pathlib

write_text() is a serious shortcut write_path = Path("/tmp/some_random_file.txt") write_path.write_text("Wow") 3 print(write_path.read_text()) 'Wow'

slide-31
SLIDE 31

COMMAND LINE AUTOMATION IN PYTHON

Rename a le with pathlib

renaming a le with pathlib

from pathlib import Path # Create a Path object modify_file = Path("/tmp/some_random_file.txt") #rename file modify_file.rename("/tmp/some_random_file_renamed.txt") ls /tmp some_random_file_renamed.txt

slide-32
SLIDE 32

Practicing with pathlib

COMMAN D LIN E AUTOMATION IN P YTH ON