Outline 1. Install Python and some libraries 2. Use and extend - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline 1. Install Python and some libraries 2. Use and extend - - PowerPoint PPT Presentation

Outline 1. Install Python and some libraries 2. Use and extend templates Machine Learning for Trading 3. Create a workflow of modules needed for analyzing Software Platform market behavior. From getting data, building a Introduction &


slide-1
SLIDE 1

Machine Learning for Trading Software Platform Introduction & Installation

Outline

  • 1. Install Python and some libraries
  • 2. Use and extend templates
  • 3. Create a workflow of modules needed for analyzing

market behavior. From getting data, building a portfolio, analyze it, computes expected return.

– Read a CSV file into python – Create an analyzer that enables you to assess a portfolo

  • Edit the analysis.py file

– Create an optimizer

Projects (tentative)

  • Assess Portfolio (5%)
  • Assess Learners & Defeat Learners (25%)
  • Market Simulator (10%)
  • Q-Learning Robot (10%)
  • Strategy Learner (15%)

Installation:

Step 1: Install your python platform a): Install Anaconda Step 2 (later) : Install Market Simulator Templates. It needs SciPy — so: Note: The Anaconda python distribution includes * NumPy, Pandas, SciPy, Matplotlib, and Python, and over 250 more packages available via a simple “conda install <packagename>” It also has an IDE. Instructor got 2.7, and the anaconda distribution of python To get the appropriate software you’ll need: python (scripting ‘programming’ language) sci.py (numerical routines), num.py (matrices, linear algebra), and matplotlib (enables generating plots of data) Installing Python (2.7) via Anaconda: Anaconda instruction site including lots of libraries with python. https://docs.continuum.io/anaconda/install

Mac Installation: 1) Instruction that the instructor used: a) installed anaconda (got required packages) https://www.continuum.io/downloads (2.7) includes, sci.py, num.py, and matplotlib Alternate Setup (pip)):

http://quantsoftware.gatech.edu/Undergrad_ML4T_Software_Setup http://quantsoftware.gatech.edu/ML4T_Software_Installation

.

slide-2
SLIDE 2

Overview

  • Read Data: Read Stock Data from a CSV File and input

it into a pandas DataFrame

– Pandas.DataFrame – Pands.read_csv

  • Select Subsets of Data: Select desired rows and

columns

– Indexing and slicing data – Gotchas: Label-based slicing convention

  • Generate Useful Plots: Visual data by generating plots

– Plotting – Pandas.DataFrame.Plot – Matplot.pyplot.plot

Workflow

  • Scrape S&P 500 ticker list and industry sectors from list of S&P 500

companies on Wikipedia (code provided).

– https://en.wikipedia.org/wiki/List_of_S%26P_500_companies – NOTE: for this class these files will be downloaded for use already – available in a zip file.

  • Use the daily close data (adjusted close) for each industry sector from

Yahoo finance

– using pandas DataReader (again – you will use cvs files that have already been downloaded).

  • Build a sample Portfolio (in lecture by hand):
  • Look at measures of the performance of a portfolio (project 1). We will

use the first measure for project 1.

– Sharp ratio (in class) – Treynor ratio – Jensen’s alpha

  • Visualization via matplotlib

Goal

  • Go from RAW data (adjusted close prices in

a .csv file) all the way to visualization

First Something Familiar: Weather Data

  • .csv Comma Separated Values of weather

conditions from Oct 2009 to Aug 2017

  • Town of Cary, North Carolina

– Temperature, pressure, humidity, … lets see – Import as “text data”

  • Next … stock data.

https://catalog.data.gov/dataset?res_format=CSV&tags=weather rdu-weather-history.csv

slide-3
SLIDE 3

Getting Real Data

In Excel (use Text Import Wizard) Data-> Get External Data->Import Text File…

Comma Separated Values (.CSV)

  • CSV File
  • Header
  • Rows

– Rows of Dates

  • Each Element is

separated by columns

  • Shift-ctrl-down

rdu-weather-history.csv

What is in a Historical Stock Data File?

a) # of employees b) Date/Time c) Company Name d) Price of the Stock e) Company’s Hometown

What is in a Historical Stock Data File?

a) # of employees b) Date/Time c) Company Name (does not change over time) d) Price of the Stock e) Company’s Hometown (does not change over time)

Does not change over time Is it relevant or expected?

slide-4
SLIDE 4

Comma Separated Values (.CSV)

  • Stock Data from

Yahoo Finance

  • CSV file pulled by

panda’s (later) DataReader()

https://finance.yahoo.com/quote/GOOG/history?ltr=1 GetWebS.py

Stock Data Files

  • Date
  • Open – price stock opens at in the morning, it is

first price in the day.

  • High – highest price in the day
  • Low – lowest price in the day
  • Close – closing price at 4 PM.
  • Volume – how many shares traded all together
  • n that day.
  • Adjusted Close – accounts for splits/and

dividends – encapsulates the increase in value if you hold stock for a long time (later).

http://www.investopedia.com/terms/a/adjusted_closing_price.asp

GOOG.csv (from Yahoo).

  • Newer dates on top, older descending.

https://finance.yahoo.com/quote/GOOG/history?ltr=1

  • Adjusted Close – adjusts / accounts for stocks

splits and dividend payments.

  • On the Current Day – Adjusted Close and

Close are always the same.

  • Previous Days:

– But as we go back in time start they to differ they are not always the same. – Actual Return is not captured by the closing price, need to use adjusted close on historical data.

https://finance.yahoo.com/quote/IBM/history

slide-5
SLIDE 5

Pandas: Included in Anaconda

  • https://en.wikipedia.org/wiki/Pandas_(software)
  • Developed by Wes McKinney while at AQR

Capital Management to analyze financial data

– Python Spreadsheets – Open Source. – Numerical Tables and Time Series – A Key Element : Data Frames

  • Slicing

– Panel Data

Goal: Store Portfolio in a Panda Data Frame

  • Want: <Symbols> vs Time
  • Includes a set of equities

(ownership)

– Exchange Traded Fund (ETF) – SPY 500

  • Tracks the index S&P 500 Index.

– Russell 1000 – AAPL – apple – GOOG – Google – Other: securities (government)

  • NaN
  • https://en.wikipedia.org/wiki/

Google

– Initial public offering (IPO) - August 19, 2004.

symbols time Adjusted close Volume Close

Warm-up: Reading into a Data frame

  • Interactively

– Import pandas – Rename it to pd

  • Read it in.
  • First column is index helping

you to access rows.

  • Stocks (get data from zip file)

– SPY, – AAPL, – GOOG, – GLD

  • Link data directory.

https://finance.yahoo.com/quote/GOOG/history?ltr=1

Exercises

Exercise 1.

  • Read in the entire CSV file in a function

– print it out.

Exercise 2.

  • Read in the entire file in a function

– Print out a selection of file

  • Top 5 lines : .head()
  • Bottom 5 lines: .tail()
slide-6
SLIDE 6

def -- Make it a function

  • If this file is run as the main program the

– if statement is true and calls test_run

  • simple-frame.py

– Entire frame – Try: printing - df.head(), df.tail()

  • Question: Print last 5 lines?

simple-frame.py

  • Only print top 5 line of data frame

– print df.head()

  • Only print bottom 5 lines of data frame

– print df.tail()

Print out a subset of columns, and/or rows:

  • Slicing: Only print rows between index 10, 20 (not

inclusive)

– print df[10:21] – print df[:21] – print df[['Date','High']].values[5]

Computation on CVS File

  • From the file, find out maximum closing price.
  • 1. Read the file into a data frame
  • Now - SPY.csv
  • Later – any symbol.
  • 2. Process the Column ‘Close’
  • 3. Use pandas function .max() to return max.

Compute Max Closing Price get_max_close(symbol)

https://pyformat.info/ 1a-maxclosingprice.py

slide-7
SLIDE 7

Quiz

1) Calculate the mean volume. 2) Calculate the max adjusted close. 3) Challenge (bonus): Return date(s) when :

– closing price is different from the adjusted price? – IBM (try this stock)

1b-meanvolume-quiz.py

Plotting with matplotlib

http://matplotlib.org/users/pyplot_tutorial.html#working-with-text 2a-1column-plots.py df = pd.read_csv("data/AAPL.csv", index_col=0) # read in data

Plot 2 Columns in a single Plot

2b-2column-plots.py

Building a Portfolio with Pandas: Create a DataFrame with Closing date

  • f Different Stocks.

BUT-à Only on trading days …

slide-8
SLIDE 8

How many days were US Stocks Traded in 2014 (over an entire year) really any year …

a) 365 b) 260 c) 252

How many days were US Stocks Traded in 2014 (over an entire year) really any year …

a) 365 b) 260 (what is this number?) c) 252

How many days were US Stocks Traded in 2014 (over an entire year)

a) 365 b) 260 (52 weeks x 5) c) 252

3 steps

  • Restrict Data Ranges (e.g., specific date range)?

– idea:

  • Get the intersection by joining with other frames.
  • Skips dates
  • Drop Missing Data Rows (NaN).
  • Join Data Incrementally, column by column
slide-9
SLIDE 9

Follow Along.

import pandas as pd start_date = '2010-01-22’ end_date = '2010-01-26’ dates= pd.date_range(start_date, end_date) df1 = pd.DataFrame(index=dates) df2_SPY = pd.read_csv("data/SPY.csv", index_col="Date", parse_dates=True, usecols=['Date', 'Adj Close'], na_values=['NaN'] # NaN should be not a number ) dfR=df1.join(df2_SPY) # Step 1. join with restricted date frame print dfR # Step 2. NaN are still in there. dfR = dfR.dropna() print dfR # dfR = df1.join(df2_SPY,how='inner’) # alternate 2 step in 1 if __name__ == "__main__": test_run()

3a-simple-join.py 3a-simple-joinKEY.py

Add in Multiple Stocks Iteratively

  • dfSPY = dfSPY.rename(columns={'Adj Close': 'SPY'})
  • Read SPY data into frame with restricted date

range

  • Iteratively add additional stocks: GOOG, IBM, GLD

and join into frame.

– Trick is to use rename to avoid column name clashes.

ValueError: columns overlap but no suffix specified: Index([u'Adj Close'], dtype='object')

3b-iterate-multiple.py

Utility Function 4b-get_data.py

Slicing.

  • execfile(“4b-get-data.py”)
  • Copy paste test_run()
  • df[‘GOOG’], df[[‘GOOG’,’IBM’]]
  • df1.ix is deprecated.. In favor of iloc and loc

indexer

slide-10
SLIDE 10

Visualization

  • 6-comparestocks.py

Visualization

  • 6-comparestocks.py
  • 6-comparestocks-normalized.py
  • Normalization

– df = df / df.ix[0,:] – According to first day.

NumPY

  • Multidimensional array

– Ndarray.

  • Functions, attributes

– Shape, ndim, size – Numpy.random – Sum, min, max,min

  • Examples:

– np.ones((5,4), dtype=np.int_)

  • Bollinger Bands

– Upper band :

  • rolling mean + 2 * rolling

StdDev

– Lower band :

  • rolling mean – 2 * rolling

StdDev

https://en.wikipedia.org/wiki/Bollinger_Bands

slide-11
SLIDE 11

Rolling statistics

  • Compute / Code financial

statistics in pandas and numPY:

– Global Statistics

  • Mean
  • Median
  • Standard Deviations

– Rolling Statistics

  • Rolling mean

– Representation of underlying value

  • f a stock
  • Rolling standard deviation

– deviate from the mean (buy and sell signal)

Portfolio Statistics

  • Daily Return
  • Cumulative Returns

– Form beginning to end (last value/initialial val) -1

  • cum_ret = (port_val[-1]/port_val[0]) - 1
  • Average Daily Returns

– daily_rets.mean()

  • Standard Deviation of Daily Return

– Daily_rets.std()

  • Sharpe Ratio (later).

60-daily-return.py

  • Missing Data

– Fillforward – Fillbackward

fakedata.py fakedata2.py

Plotting by Example

  • Histograms
  • Scatterplots
slide-12
SLIDE 12

Sharpe Ratio

  • Considers our return in

the context of risk

  • Risk is volatile

(standard deviation)

  • Adjust our return in

return for the risk

  • Volatility

– Measured by standard deviation

Sharpe Ratio

  • Considers our return in

the context of risk

  • Risk is volatile

(standard deviation)

  • Adjust our return in

return for the risk

  • Volatility

– Measured by standard deviation

1 2 3

Sharpe Ratio

  • 1. Higher Returns is Better
  • 2. Less Volatility/Less Risk

is Better

  • 3. Not Enough Information

– Returns: ABC > XYZ – Volatility ABC > XYZ – ABC is higher returns, but more risk

1 2 3

Sharpe Ratio

  • Adjusts return for risk
  • A quantitative way to assess a portfolio

– 1. ABC is better because it has the same volatility but higher returns – 2. same returns but XYZ has lower risk so XYZ is better – 3. A quantity such as the Sharpe Ratio may give you a number to determine which is better

  • Sharpe ratio also considers (comparative)

– Risk free rate of returns

  • Bank account or treasure note

– Lately risk free return is 0, bank interest rate is 0, or close to 0

slide-13
SLIDE 13

Which Formula is Best?

  • Rp : Portfolio Return
  • Rf : Risk Free Rate of Return
  • σp : Standard Deviation of Portfolio Return

a) Rp – Rf + σp b) Rf / Rf - σp c) (Rp – Rf) / σp

General Form of the Sharpe Ratio

Computing Sharpe Ratio

  • SR (expected value)

= E [ Rp – Rf]/std[Rp-Rf]

  • Expected value à mean over time:

= mean(daily_rets – daily_rf)/std(daily_rets – daily_rf)

  • What is the risk free rate?

– LIBOR (London Inter Bank Offer Rate) – Interest Rate: 3 months Treasury Bill – 0%! Short Cut.

Outline: Computing Sharpe Ratio

  • SR (expected value)

= E [ Rp – Rf]/std[Rp-Rf] Expected value à mean over time: = mean(daily_rets – daily_rf)/std(daily_rets – daily_rf)

  • Risk Free Rate not given on a daily bases

– LIBOR – Annual/6 month bases – Short Cut –

  • Convert annual rate to a daily amount
  • Example:

– Annual Rate: 0.1 per year Risk Free Rate – Total Value at end of year: 1.0 * 0.1 – What is the Interest Rate per Day: » Daily_RF = SQRT_252(1.0 + 0.1) – 1 è 0.0 (approximation)

– Constant Standard Deviation of a Constant