1、SMUSMProcess Management&IPCIn Multiprocessor Operating SystemsPresented by Group A1Garrick WilliamsonBrad CrabtreeAlex MacFarlaneSMUSMProcess Management&IPC Intro(Focus on Solaris)Garrick WilliamsonSMUSMIntroductionSunOS is the operating system component of the Solaris environment.It supports Symmet
2、ric Multiprocessing(SMP).See diagram on next page for an example of an SMP system.The kernel runs equally on all processors within a tightly coupled shared memory multiprocessor system.Control flows are entirely threads,including interrupts.SMUSMSMP System ExampleSMUSMSunOS 5.0 ArchitectureIn additi
3、on to Kernel level threads,SunOS also supports multiple threads of control,called lightweight processes(LWPs).There is one Kernel thread for each LWP.The Kernel threads are used when the LWPs perform system functions/calls.SMUSMSunOS Architecture DiagramSMUSMSynchronizationThreads/Processes synchron
4、ize through a variety of ways:Mutual Exclusion locksCondition VariablesCounting SemaphoresMultiple Readers and single writer locksThe Mutual Exclusion and writer locks use a priority inheritance protocol in order to prevent priority inversion.SMUSMSolaris IPCSolaris provides the following mechanisms
5、 for IPC:Simple,but limited mechanisms includeSignalsPipes and named pipes(FIFO)SocketsMore versatile mechanisms includeMessage QueuesShared memory(With Memory Mapped files and IPC shared Memory options)SemaphoresSMUSMSimple IPCPipes do not allow unrelated processes to communicate.Named pipes allow
6、unrelated processes to communicate,but are not private channels.Using the kill function,processes may communicate with signals,but only through signal numbers.SMUSMComplex IPCMessaging allows formatted data streams to be sent to arbitrary processes.Semaphores allow processes synchronization.And shar
7、ed memory allows processes to share part of their virtual address space.SMUSMIRIX Process Management And IPCBrad CrabtreeSMUSMOutlineHardware BackgroundProcess Management FacilitiesInterprocess Communication FacilitiesSMUSMLarge Scale Computing Machines a RealityThe Avalon A12.The Cambridge Parallel
8、 Processing Gamma II Plus.The Compaq AlphaServer SC.The Fujitsu AP3000.The Fujitsu VPP5000 series.The Hitachi SR8000 system.The HP Exemplar V2600.The IBM RS/6000 SP.The NEC Cenju-4.The NEC SX-5.The Quadrics Apemille.The SGI Origin 2000 series.The Sun E1000 Starfire.The Tera/Cray SV1.The Tera/Cray T3
9、E.The Tera MTA June 2001Raytheon installs 1152 processor Origin 3000 series at NOAA$67M900 BFLOPS/sec2 PB Tape LibrarySMUSMSGI Origin ArchitectureccNUMA(NUMALink)non-blocking crossbar switches as an interconnect fabric1.6GB-per-second crossbar switchSMUSMSwitch verses BusSMUSM“Cellular IRIX”Schedule
10、rFacilities for Improving Scalability and LocalityJob PrioritiesReal-Time JobsBatch CriticalTime ShareBatchWeightlessUser-level Scheduler ConceptSMUSMReal Time JobsGlobal Run Queue replaced with Implicit Binding Schemeimprove cache affinity and scalabilitybinds top N jobs,by priority,to N CPUsCPU is
11、 always available when real-time job comes in because currently running job is of lower priorityReal-Time jobs always go to same CPUSMUSMHard Real-Time in IRIXREACT/PRO ExtentionsLock processes,memory to CPUsDisable IRIX scheduler and replace with Frame Scheduler,Deadline Scheduler or None(yours)Dir
12、ect interrupts away from CPUsDeterministic interrupt latencySMUSMTime Sharing SchedulerDegrading Priority replaced with Earnings Model Distribution controlled by Virtual Multiprocessors(VMPs)at 1 HZ,VMPs balance run queues with nearest neighbors and push out extra workSMUSMParallel Job SchedulingGan
13、g Scheduling replaced with NanothreadsSpace sharing over Time SharingJob requests CPUs,gets#avail and then algorithm is re-blockedWhen thread preempted,context is saved to shared memory and User Level Scheduler re-blocks againSMUSMReplicated Kernel TextWired in 16MB TLB pair into kernel virtual memo
14、ry spaceOne read-only,one read-writeTLB miss exception overhead is avoidedSMUSMMemory MigrationTrying to avoid memory hot spotsReference counters in hub(local/remote)Fast Block Transfer EngineMarks Source Page as PoisonedLazy TLB ShootdownHysterisis for frequent migration managedSMUSMTypes of IPC&Co
15、mpatibilitySMUSMPOSIX vs.IRIX Shared MemoryPOSIXFunction NamePurpose and Operationmmap(2)Map a file or shared memory object into the address spaceshm_open(2)Create,or gain access to,a shared memory object.shm_unlink(2)Destroy a shared memory object when no references to it remain open.IRIXFunction N
16、ame Purpose and Operation usconfig(3)Establish the default size of an arena,the number of concurrent processes that can use it,and the features of IPC objects in it.usinit(3)Create an arena or join an existing arena.usadd(3)Join an existing arena.SMUSMusconfig optionsusconfig()Flag Name Meaning CONF
17、_INITSIZEThe initial size of the arena segment.The default is 64 KB.Often you know that more is needed.CONF_AUTOGROWWhether or not the arena can grow automatically as more IPC objects or data objects are allocated(default:yes).CONF_INITUSERSThe largest number of concurrent processes that can use the
18、 arena.The default is 8;if more processes than this will use IPC,the limit must be set higher.CONF_CHMODThe effective file permissions on arena access.The default is 600,allowing onlyprocesses with the effective UID of the creating process to attach the arena.CONF_ARENATYPEEstablish whether the aren
19、a can be attached by general processes or only by members of one program(a share group).CONF_LOCKTYPEWhether or not lock objects allocated in the arena collect metering statistics as they are used.CONF_ATTACHADDR An explicit memory base address for the next arena to be createdCONF_HISTON/OFFStart an
20、d stop collecting usage history(more bulky than metering information)for semaphores in a specified arena.CONF_HISTSIZESet the maximum size of semaphore history records.SMUSMIRIX IPCTuned for Multiprocessor EnvironmentUtilizes“shared arena”memorymemory that can be mapped into the address spaces of mu
21、ltiple processesA shared arena is identified with a file that acts as the backing store for the arena memoryshared memory is pinned into physical memory,accessible by programs and kernelSMUSMFirst Touch RulePages in an arena are allocated via first touchplaces virtual page in the node that first acc
22、esses itTo ensure spread processes have local access to most used pages,touch whole pages in arena from processes which use them mostdynamic realloc.will handle;but slowerSMUSMLinux Process ManagementAlex MacFarlaneSMUSMThreadsNumber of threads limited only to size of physical memory.By default,set
23、to half:max_threads=mempages/(THREAD_SIZE/PAGE_SIZE)/2;Modifiable at runtime using sysctl()or the proc filesystem interface.Was limited to 4k in Linux 2.2SMUSMThread TypesIdle Thread(s)One per CPU in SMP systemCreated at boot timeKernel ThreadsUser-space ThreadsThreads created by clone(),an extensio
24、n to fork()SMUSMclone()flagsCLONE_VMShare data and stackCLONE_FSShare filesystem infoCLONE_FILESShare open filesCLONE_SIGHANDShare signal handlersCLONE_PIDShare PID with parentSMUSMLinux Scheduling PoliciesSCHED_OTHER Traditional UNIX schedulingSCHED_FIFORuns until blocking on I/O,explicitly yieldin
25、g CPU or being pre-empted by higher priority realtime task.SCHED_RRSame as SCHED_FIFO but limited to a timesliceAll user-space tasks must use SCHED_OTHERStatic priorities may be assigned using nice()SMUSMProcess RepresentationA collection of struct task_struct structuresLinked in two ways:A hashtabl
26、e hashed on pidA circular doubly-linked listFind specific task using find_task_by_pid()Walk tasks using for_each_task()Modifications protected by a read-write spinlock.SMUSMProcess StatesTASK_RUNNING:means the task is in the run queue.TASK_INTERRUPTIBLE:means the task is sleeping but can be woken up
27、 by a signal or by expiry of a timer.TASK_UNINTERRUPTIBLE:same as previous,except it cannot be woken up.TASK_ZOMBIE:task has terminated but has not had its status collected(wait()-ed for)by the parent(natural or by adoption).TASK_STOPPED:task was stopped,either due to job control signals or due to p
28、trace().TASK_EXCLUSIVE:this is not a separate state but can be OR-ed to either one of TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.Prevents“thundering herd”.A process state may be modified asynchronously.SMUSMAtomic OperationsTwo typesBitmapatomic_tWrapped by bus locking on SMPBitmap operations for fr
29、ee/allocated bitmapsset_bit(),clear_bit(),change_bit(),test_and_set_bit()etc.atomic_t operations for numeric countsatomic_read(),atomic_set(),atomic_add(),atomic_inc()etc.SMUSMReferencesThe SGI Origin software environment and application performance,Whitney,S.;McCalpin,J.;Bitar,N.;Richardson,J.L.;St
30、evens,L.,Compcon 97.Proceedings,IEEE,1997,Page(s):165-170An Integrated Kernel-and User-Level Paradigm for Efficient Multiprogramming,Masters Thesis,D.Craig,CSRD Technical Report No.1533,University of Illinois at Urbana-Champaign,1999.Integrated scheduling of multimedia and hard real-time tasks,Kanek
31、o,H.;Stankovic,J.A.;Sen,S.;Ramamritham,K.,Real-Time Systems Symposium,1996.,17th IEEE,1996,Page(s):206-217An Efficient Kernel-level Scheduling Methodology for Multiprogrammed Shared Memory Multiprocessors,Proc.of the First Merged IPPS/SPDP Conference,pp.392-397,Orlando,FL,1998.18 Topics in IRIX Prog
32、ramming,Chapter 2,Interprocess Communication,Silicon Graphics,Inc.,2001Topics in IRIX Programming,Chapter 3,Sharing Memory Between Processes,Silicon Graphics,Inc.,2001SMUSMReferencesPhyllis E.Crandall,Phyllis E.Crandall,ErantiEranti V.V.SumithasriSumithasri,and Mark A.Clement.,and Mark A.Clement.Per
33、formance comparison of desktop multiprocessing and Performance comparison of desktop multiprocessing and workstation cluster computing.In Proceedings of the Fifth workstation cluster computing.In Proceedings of the Fifth International Symposium on High Performance Distributed International Symposium on High Performance Distributed Computing,August 1996.Computing,August Kotz,David and Nils Nieuwajaar,Flexibility and Performance of Parallel File Systems,ACM Operating Systems Review 30(2),ACM Press,April 1996,pp.63-73.