1、Introduction to Python for Econometrics,Statistics and Data AnalysisKevin Sheppard University of OxfordSaturday 12th October,20132012,2013 Kevin Sheppard2Notes to the 2nd EditionThis edit ion includes t he following changes from t he first edit ion(March 2012):The preferred inst allat ion met hod is
2、 now Cont inuum Analyt ics(Anaconda.Anaconda is a complet e scient ific st ack and is available for all major plat forms.New chapt er on pandas,pandas provides a simple but powerful t ool t o manage dat a and perform basic analysis.It also great ly simplifies import ing and export ing dat a.New chap
3、t er on advanced select ion of element s from an array.Numba provides just-in-t ime compilat ion for numeric Pyt hon code which oft en produces large performance gains when pure NumPy solut ions are not available(e.g.looping code).Dict ionary,set and t uple comprehensions Numerous t ypos All code ha
4、s been verified working against Anaconda 1.7.0.iContents1 Introduction 11.1 Background.11.2 Convent ions.21.3 Import ant Component s of t he Pyt hon Scient ific St ack.31.4 Set up.41.5 Test ing t he Environment.121.6 Pyt hon Programming.121.7 Exercises.161.A regist er_pyt hon.py.182 Python 2.7 vs.3(
5、and the rest)212.1 Pyt hon 2.7 vs.3.212.2 Int el Mat h Kernel Library and AMD Core Mat h Library.212.3 Ot her Variant s.222.A Relevant Differences bet ween Pyt hon 2.7 and 3.233 Built-in Data Types 253.1 Variable Names.253.2 Core Nat ive Dat a Types.263.3 Pyt hon and Memory Management.363.4 Exercise
6、s.384 Arrays and Matrices 414.1 Array.414.2 Mat rix.434.3 1-dimensional Arrays.444.4 2-dimensional Arrays.454.5 Mult idimensional Arrays.454.6 Concat enat ion.454.7 Accessing Element s of an Array.464.8 Slicing and Memory Management.514.9 import and Modules.53iii4.10 Calling Funct ions.544.11 Exerci
7、ses.565 Basic Math 575.1 Operat ors.575.2 Broadcast ing.585.3 Array and Mat rix Addit ion(+)and Subt ract ion(-).595.4 Array Mult iplicat ion(*).605.5 Mat rix Mult iplicat ion(*).605.6 Array and Mat rix Division(/).605.7 Array Exponent iat ion(*).605.8 Mat rix Exponent iat ion(*).615.9 Parent heses.
8、615.10 Transpose.615.11 Operat or Precedence.615.12 Exercises.626 Basic Functions and Numerical Indexing 656.1 Generat ing Arrays and Mat rices.656.2 Rounding.686.3 Mat hemat ics.696.4 Complex Values.716.5 Set Funct ions.716.6 Sort ing and Ext reme Values.726.7 Nan Funct ions.746.8 Funct ions and Me
9、t hods/Propert ies.756.9 Exercises.767 Special Arrays 777.1 Exercises.788 Array and Matrix Functions 798.1 Views.798.2 Shape Informat ion and Transformat ion.808.3 Linear Algebra Funct ions.878.4 Exercises.909 Importing and Exporting Data 939.1 Import ing Dat a using pandas.939.2 Import ing Dat a wi
10、t hout pandas.949.3 Saving or Export ing Dat a using pandas.1009.4 Saving or Export ing Dat a wit hout pandas.100iv9.5 Exercises.10110 Inf,NaN and Numeric Limits 10310.1 inf and NaN.10310.2 Float ing point precision.10310.3 Exercises.10411 Logical Operators and Find 10711.1,=,t his indicat es t hat
11、t he command is running an int eract ive I Pyt hon session.Out put will oft en appear aft er t he console command,and will not be preceded by a command indicat or.2 x=1.0 x+23.0If t he code block does not cont ain t he console session indicat or,t he code cont ained in t he block is int ended t o be
12、 execut ed in a st andalone Pyt hon file.from-fut ure-import print _funct ionimport numpy as npx=np.array(l,2,3,4)y=np.sum(x)print(x)print(y)1.3 Important Components of the Python Scientific Stack1.3.1 PythonPyt hon 2.7.5(or lat er,but in t he Pyt hon 2.7.x family)is required.This provides t he core
13、 Pyt hon int erpret er.1.3.2 NumPyNumPy provides a set of array and mat rix dat a t ypes which are essent ial for st at ist ics,economet rics and dat a analysis.1.3.3 SciPySciPy cont ains a large number of rout ines needed for analysis of dat a.The most import ant include a wide range of random numb
14、er generat ors,linear algebra rout ines and opt imizers.SciPy depends on NumPy.1.3.4 IPythonI Pyt hon provides an int eract ive Pyt hon environment which enhances product ivit y when developing code or performing int eract ive dat a analysis.1.3.5 matplotlibmat plot lib provides a plot t ing environ
15、ment for 2D plot s,wit h limit ed support for 3D plot t ing.1.3.6 pandaspandas provides high-performance dat a st ruct ures.31.3.7 Performance ModulesA number of modules are available t o help wit h performance.These include Cyt hon and Numba.Cyt hon is a Pyt hon module which facilit at es using a s
16、imple Pyt hon-derived creole t o writ e funct ions t hat can be compiled t o nat ive(C code)Pyt hon ext ensions.Numba uses a met hod of just-in-t ime compilat ion t o t ranslat e a subset of Pyt hon t o nat ive code using Low-Level Virt ual Machine(LEVM).1.4 SetupThe recommended met hod t o inst all
17、 t he Pyt hon scient ific st ack is t o use Cont inuum Analyt ics,Anaconda.Inst ruct ions are also provided for direct ly inst alling Pyt hon and t he required modules if it isnt possible t o inst all Anaconda.1.4.1 Continuum Analytics Anaconda9Anaconda,a free product of Cont inuum Analyt ics(www.co
18、nt inuum.io),is a virt ually complet e scient ific st ack for Pyt hon.It includes bot h t he core Pyt hon int erpret er adn st andard libraries as well as most modules required for dat a analysis.Anaconda is free t o use and modules for accelerat ing t he performance of linear algebra on Int el proc
19、essors using t he Mat h Kernel Library(MKL)are available(free t o academic users and for a small cost t o non-academic users).Cont inuum Analyt ics also provides ot her high-performance modules for reading large dat a files or using t he GPU t o furt her accelerat e performance for an addit ional,mo
20、dest charge.Most import ant ly,inst allat ion is ext raordinarily easy on Windows,Linux and OSX.Anaconda is also simple t o updat e t o t he lat est version usingconda updat e conda conda updat e anacondaWindowsInst allat ion on Windows requires downloading t he inst aller and running.These inst ruc
21、t ions use ANACONDA t o indicat e t he Anaconda inst allat ion direct ory(e.g.t he default is C:Anaconda)Once t he set up has complet ed,open a command prompt(cmd.exe)and runcd ANACONDAconda updat e conda conda updat e anaconda conda creat e-n economet rics cyt hon dist ribut e ipyt hon-not ebook ip
22、yt hon-qt console jinja2Ixml mat plot lib nose numba numexpr numpy pandas pip pygment s pyt ables pywin32 scipy st at smodels xlrd xlwtwhich will first ensure t hat Anaconda is up-t o-dat e and t hen creat e a virt ual environment named economet rics.The virt ual environment provides a set of compon
23、ent s which will not change even if anaconda is updat ed.Using a virt ual environment is a best pract ice and is import ant since component updat es can lead t o errors in ot herwise working programs due t o backward incompat ible changes in a module.The long list of modules in t he conda creat e co
24、mmand includes all of t hose t hat will be used in t hese not es.It is also possible t o inst all all available packages using t he command conda creat e-n economet rics anaconda,The economet rics environment must be act ivat ed before use.This is accomplished by running4Python 2.7.5 SAnaconda 1.7.0
25、!(default-Jul 1 2013,12:37:52 MSC u.500 64 bit Introduction and overuiew of IPythonJ s features.Kquickref-Quick reference.heIp-PythonJ s own heIp system.object?Details about 9 object1use 9 object?1 for extra details.Figure 1.1:IPython running in the standard Windows console(cmd.exe).ANACONDAScript s
26、act ivat e.bat economet ricsfrom t he command prompt,which prepends economet rics t o t he prompt as an indicat ion t hat virt ual environment is act ive.Act ivat e t he economet rics environment and t hen run pip inst all openpyxlwhich inst alls t wo packages not direct ly available in Anaconda.The
27、 final st ep is t o creat e launchers for t he bot h t he virt ual environment and t he IPyt hon int eract ive Pyt hon console.First,open a t ext edit or,ent ercmd/k ANACONDAScript sact ivat e economet ricsand save t he file as ANACONDAenvseconomet ricspyt hon-economet rics.bat.The bat ch file will
28、open a command prompt in t he economet rics virt ual environment.Right click on t he bat ch file and select Send To,Deskt op(Creat e Short cut)which will place a short cut on t he deskt op.Next,creat e a launcher t o run IPyt hon in t he st andard Windows cmd.exe console.Open a t ext edit or ent erc
29、md/c ANACONDAScript sact ivat e economet rics&st art ipyt hon.exe-pylaband save t he file as ANACONDAenvseconomet ricsipyt hon-plain.bat.Finally,right click on ipyt hon-plain.bat select Sent To,Deskt op(Creat e Short cut).The icon of t he short cut will be generic,and if you want a more meaningful i
30、con,select t he propert ies of t he short cut,and t hen Change Icon,and navigat e t o c:Anacondaenvseconomet ricsMenu and select IPyt hon.ico.Opening t he bat ch file should creat e a window similar t o t hat in figure 1.1.The Windows command int erpret er(cmd.exe)is very limit ed compared t o ot he
31、r plat forms.Fort unat ely,cmd.exe can be replaced wit h an upgraded version known as Console2.To use Console2,ext ract t he cont ent s of t he zip file Console-2.00b 148-Bet a_64bit.zip(for example,t o AN AGON DAConsole2).Launch Console.exe,and select Edit Set t ings Tabs.Click on Add,and input t h
32、e following:Using natplotlib backend:Qt4AggIn 11 1:5Figure 1.2:IPython running in a QtConsole session.Title IPyt hon(Pylab)Icon Navigat e t o ANACONDAenvseconomet ricsMenu and select IPyt hon.ico.Shell cmd/k ANACONDAScript sact ivat e.bat economet rics&pyt hon ANACONDAenvseconomet ricsScript sipStar
33、tup dir ANACONDAenvseconomet ricsThis environment can be accessed by set t ing IPyt hon(Pylab)as t he default t ab in Console2,or by explicit ly opening new t ab wit h t his environment.A t hird opt ion,known as t he Qt Console,is provided by IPyt hon.The Qt Console offers addit ional feat ures such
34、 as running mult iple sessions simult aneously or having figures appear inline wit h code.Begin by ent ering t he following command in a t ext edit or,cmd/c cd ANACONDAScript s&act ivat e economet rics&st art pyt honw ANACONDAenvs economet ricsScript sipyt hon-script.py qt console-pylab=qt 4-colors=
35、linux-ConsoleWidget.font _size=ll-ConsoleWidget.font _family=Bit st ream Vera Sans Monoand t hen save t he file as ANACONDAenvseconomet ricsipyt hon-qt console.bat.Creat e a short cut for t his bat ch file,and change t he icon if desired.The t railing opt ions,such as-colors=linux,affect t he visual
36、 appearance of t he Qt Console.The opt ions list ed here are my preferred set up,and assume t hat t he free font Bit st ream Vera Sans Mono has been inst alled.Opening t he bat ch file should creat e a window similar t o t hat in figure 1.2.6Linux and OSXInst allat ion on Linux requires execut ingba
37、sh Anaconda-x.y.z-Linux-ISA.shwhere x.y.z will depend on t he version being inst alled and ISA will be eit her x86 or more likely x86_64.The OSX inst aller is available eit her in a GUI inst alled(pkg format)or as a bash inst aller which is inst alled in an analogous manner t o t he Linux inst allat
38、 ion.Aft er inst allat ion complet es,change t o t he folder where Anaconda inst alled(writ t en here as ANACONDA,default-/anaconda)and execut ecd ANACONDAcd bin./conda updat e conda./conda updat e anaconda./conda creat e-n economet rics cyt hon dist ribut e ipyt hon-not ebook ipyt hon-qt console ji
39、nja2 Ixml mat plot lib nose numba numexpr numpy pandas pip pygment s pyt ables scipy st at smodels xlrd xlwtwhich will first ensure t hat Anaconda is up-t o-dat e and t hen creat e a virt ual environment named economet rics wit h t he required packages.The act ivat e t he newly creat ed environment,
40、runsource ANACONDA/bin/act ivat e economet ricsand t hen run t he commandpip inst all openpyxlt o inst all t wo packages not included in Anaconda.The st andard IPyt hon environment can be launched in t he syst em console usingipyt hon-pylabor t he IPyt hon-provided Qt Console can be launched usingip
41、yt hon qt console-pylabFurt her opt ions can be passed t o IPyt hon t o improve t he appearance of t he Qt Console.For example,ipyt hon qt console-pylab=qt 4-colors=linux-ConsoleWidget.font _size=ll-ConsoleWidget.font _family=Bit st ream Vera Sans Mono1.4.2 Installation without AnacondaAnaconda grea
42、t ly simplifies inst alling t he scient ific Pyt hon st ack.However,t here may be sit uat ions where inst alling Anaconda is not possible,and so(subst ant ially more complicat ed)inst ruct ions are included for bot h Windows and Linux.WindowsThe list of required windows packages,along wit h t he ver
43、sion and Windows inst allat ion file,required for t hese not es include:7PackageVersionFile namePyt hon2.7.5pyt hon-2.7.5.amd64Set upt ools1.1.5set up t ools-1.1.5.win-amd64-py2.7Pip1.4.1pip-1.4.1.win-amd64-py2.7Virt ualenv1.10.1virt ualenv-1.10.1.win-amd64-py2.7Jinja22.7.1Jinja2-2.7.1.win-amd64-py2
44、.7.exeTornado3.1.1t ornado-3.1.1.win-amd64-py2.7.exePyCairo1.10.0pycairo-1.10.0.win-amd64-py2.7PyZMQ13.1.0pyzmq-13.1.0.win-amd64-py2.7PyQt4.9.6-1PyQt-Py2.7-x64-gpl-4.9.6-1NumPy1.7.1numpy-MKL-1.7.1.win-amd64-py2.7SciPy0.12.0scipy-0.12.0.win-amd64-py2.7Mat plot Lib1.3.0mat plot lib-1.3.0.win-amd64-py2
45、.7pandas0.12.0pandas-0.12.0.win-amd64-py2.7I Pyt hon1.1.0ipyt hon-1.1.0.win-amd64-py2.7These remaining packages are opt ional and are only discussed in t he final chapt ers relat ed t o performance.PackageVersionFile namePerformanceCythonCyt honNumba0.19.1Cyt hon-0.19.1.win-amd64-py2.7LEVMPy0.12.0ll
46、vmpy-0.12.0.win-amd64-py2.7LIVMMat h0.1.1Hvmmat h-0.1.1.win-amd64-py2.7Met a0.1.0met a-0.1.0dev.win-amd64-py2.7Numba0.10.2numba-0.10.2.win-amd64-py2.7pandas(Optional)Bot t leneck0.7.0Bot t leneck-0.7.0.win-amd64-py2.7NumExpr2.2.1numexpr-2.2.1.win-amd64-py2.7Pat sy0.2.1pat sy-0.2.1.win-amd64-py2.7St
47、at smodels0.5.0st at smodels-0.5.0.win-amd64-py2.7PyTables3.0.0t ables-3.0.0.win-amd64-py2.7Begin by inst alling Pyt hon,set upt ools,pip and virt ualenv.Aft er t hese four packages are inst alled,open an elevat ed command prompt(cmd.exe wit h administ rat or privileges)and init ialized t he virt ua
48、l environment using t he command:cd C:Dropbox virt ualenv economet rics8I prefer t o use my Dropbox as t he locat ion for virt ual environment s and have named t he virt ual environment economet rics.The virt ual environment can be locat ed anywhere(alt hough best pract ice is t o use a pat h wit ho
49、ut spaces)and can have a different name.Throughout t he remainder of t his sect ion,VIR-TUALENV will refer t o t he complet e direct ory cont aining t he virt ual environment(e.g.C:Dropboxeconomet rics).Once t he virt ual environment set up is complet e,runcd VIRTUALENVScript s act ivat e.batpip ins
50、t all xlrd xlwt openpyxl pyreadline pyt hon-dat eut il pyt z=2013d pygment s pyparsingwhich act ivat es t he virt ual eiiviroiiiiienl and inst alls some addit ional required packages.Finally,before inst alling t he remaining packages,it is necessary t o regist er t he virt ual environment as t he de