java外文文献.doc_咨信网zixin.com.cn

资源描述

Computer Communications 23 (2000) 1594±1605 On object initialization in the Java bytecodeq S. Doyon*, M. Debbabi LSFM Research Group, Department of Computer Science, Laval University, Sainte Foy, Que., Canada G1K 7P4 Abstract Java is an ideal platform for implementing mobile code systems, not only because of its portability but also because it is designed with security in mind. Untrusted Java programs can be statically analyzed and validated. The program's behavior is then monitored to prevent potentially malicious operations. Static analysis of untrusted classes is carried out by a component of the Java virtual machine called the veri®er. The most complex part of the veri®cation process is the data¯ow analysis, which is performed on each method in order to ensure type-safety. This paper clari®es in detail one of the tricky aspects of the data¯ow analysis: the veri®cation of object initialization. We present and explain the rules that need to be enforced and we then show how veri®er implementations can enforce them. Rules for object creation require, among other things, that uninitialized objects never be used before they are initialized. Constructors must properly initialize their this argument before they are allowed to return. This paper also deals with initialization failures (indicated by exceptions): the object being initialized must be discarded, and constructors must propagate initialization failures. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Java bytecode; Object initialization; Data¯ow analysis; static analysis; java security 1. Introduction The Java architecture is particularly well-suited for implementing mobile code systems. A mobile code archi-tecture allows a computer to fetch a program (or parts of a program) from a network source and execute it locally. However, security is a critical aspect of mobile code archi-tectures. The very essence of mobile code is to execute a program that originates from a remote source. This is inher-ently dangerous because it is not known what actions that program will take. By executing the mobile code, we are allowing it to perform operations on our machine and we are giving it access to our local resources. Java is especially well-suited for implementing mobile code systems for three reasons: ² Java source is compiled into a platform-independent intermediate form called Java bytecode. Java byte-code is then interpreted by the JVM (Java virtual machine). This makes Java bytecode completely portable, which means a piece of Java code in compiled form should run on any receiving machine. q The research reported in this paper has been supported by the National Science and Engineering Research Council (NSERC), the Fonds pour la formation de chercheurs et l'aide aÁ la recherche (FCAR), and the Defense Research Establishment Valcartier (DREV), Department of National Defense. * Corresponding author. Tel.: _1-41-8656-7035; fax: _1-41-8656-2324. E-mail address: doyon@ift.ulaval.ca (S. Doyon). ² It is dynamically linked: the JVM will load classes from different network sources as they are needed and will link them into the program while it runs. ² The Java architecture is built with security in mind: its design makes it possible to enforce suf®cient security to make mobile code safe and practical. Currently, the most popular manifestation of Java mobile code is applets. A JVM (bytecode interpreter) is incor-porated in web browsers. Web pages can then include links that point to the compiled (bytecode) form of programs which are called applets. The applet can then be loaded by the browser and executed locally with no special effort on the user's part. The veri®er is a key component of the Java security archi-tecture. Its role is to examine compiled classes as they are loaded into the JVM in order to ensure that they are well-formed and valid. It checks that the code respects the syntax of the bytecode language and that it respects the language rules. Another component of the Java security architecture, called the security manager, monitors access to system resources and services. The security manager is a security layer, which goes on top of the veri®er and relies on its effectiveness. The most complex step of the veri®cation process performed by the veri®er requires running a data¯ow analy-sis on the body of each method. There are a few particularly tricky issues regarding the data¯ow analysis. In this paper, we focus on the issues relating to the initialization of 0140-3664/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 4 0 - 3 6 6 4 ( 0 0 ) 0 0 2 4 5 - 0 S. Doyon, M. Debbabi / Computer Communications 23 (2000) 1594±1605 1595 new objects: ² Issues relating to object creation: A new object is created in two steps: space is allocated for the new object, and then it is initialized. When performing the data¯ow analysis, the veri®er must ensure that certain rules are respected: the constructor used to initialize an object must be appropriate, an object must not be used before it is initialized, an object must not be initialized more than once and initialization failures (indicated by exceptions) must be handled properly. ² Issues relating to constructors: The constructor is respon-sible for initializing a new object. The ®rst part of the constructor's work performs initialization from a typing point of view, which implies directly or indirectly calling a constructor from the superclass. The rest of the constructor performs application-speci®c initialization. The veri®er must ensure that a constructor properly initi-alizes the current object before it returns, that it does not use the current object in any way before calling the super-class constructor and that it propagates any initialization failure occurring in the superclass constructor. The Of®cial documentation on the veri®er, provided in (Ref. [1], Sections 4.8 and 4.9) and in Ref. [2], is relatively sparse; the portions discussing object initialization are very brief, vague, and leave out some important issues. Indepen-dent work presented in Ref. [3] has clari®ed many aspects. Freund and Mitchell have extended the formalization of a subset of the Java bytecode language introduced in Ref. [4]. They used a type system to describe the veri®er's handling of object initialization. Our paper reviews and explains the rules related to object initialization and discusses how a veri®er implementation can enforce them. We also touch on a few issues not discussed in Ref. [3]. Exceptions thrown during object initialization indicate initialization failures and must be handled properly, both inside and outside of a constructor. We also provide a comprehensive, intuitive explanation of how the rules for object creation can be enforced with minimal effort. We assume that the reader has some knowledge of the Java bytecode language, as well as a basic understanding either of data¯ow analysis in general or of the particular analysis technique used by the Java bytecode veri®er. The unfamiliar reader may consult the following references for more complete information: for the Java language the reader may refer to the of®cial speci®cation of the language [5]. The best way to learn Java or to ®nd a more understandable explanation of its concepts is to read Ref. [6]. For details on the Java standard library, see Ref. [7]. The workings of the JVM and the bytecode instruction set are described in the of®cial JVM speci®cation [1]. For a lighter approach, see Ref. [8]. To gain a good understanding of the Java bytecode language, it is necessary to experiment with it. Two tools are essential: a class ®le disassembler, that will print out a class ®le (and in particular the bytecode) in a readable format. Sun's javap tool, which comes with the JDK can be used for this, although other alternatives are available. A byte-code assembler, that produces class ®les from some source with a manageable syntax. Otherwise, constructing binary class ®les by hand would be dif®cult and time consuming. A great solution is the excellent jasmin [9]. This paper is organized as follows. Section 2 provides a brief overview of the data¯ow analysis in order to show the context in which veri®cation of object initialization occurs. Section 3 deals with the creation of new objects, while Section 4 explains the special requirements imposed on constructors. Each of these sections ®rst presents the neces-sary rules that the veri®er must somehow enforce, and then discusses how an implementation could achieve the desired result. Section 5 shows that constructors may ªleakº or ªsaveº a copy of their this reference, which means that it is possible for incompletely initialized objects to be actually used. Section 6 lists some of the related work. Some concluding remarks are ultimately sketched as a conclusion in Section 7. 2. Data¯ow analysis The Java bytecode veri®er ensures that the classes loaded by the JVM do not compromise the security of the system, either through disrespect of the language rules or through compromise of the integrity of the virtual machine. The veri®er validates many syntactical aspects of the class ®le. It validates ®eld and method declarations. It makes some checks relating to the superclass. It veri®es references to other classes, other methods and ®elds and it enforces access restriction mechanisms (like protected, private and ®nal). The body of each method is examined in turn: each byte-code instruction and its operands are validated. The most complex yet most interesting part of the veri®-cation process is the data¯ow analysis. It is performed inde-pendently on each method. The data¯ow analysis checks that each bytecode instruction gets arguments of the proper type (from the stack or from the registers), detects and prevent over¯ows and under¯ows of the expression evaluation stack and ensures that subroutines are used consistently. The data¯ow analysis also must check that object initialization is performed correctly. This paper will attempt to clarify the properties that need to be enforced on object creation and constructors. We will also propose ways in which a veri®er implementation can enforce those rules. In order to perform the data¯ow analysis, it is necessary to keep track of the type of each value on the stack and in the registers at each program point. We will assume that each instruction of a method constitutes a program point, although it is possible to use fundamental blocks of instruc-tions as program points. The type, which is recorded by the data¯ow analysis for a given location at a given program point must be consistent, irrespective of the execution path used to reach that program point. When there is a con¯ict 1596 S. Doyon, M. Debbabi / Computer Communications 23 (2000) 1594±1605 because two or more paths would yield different types of values for the same location, then we record for that location a common supertype of all the types that could actually occur. For instance, if at a given program point a certain loca-tion could contain either an instance of FileInputStream or an instance of ByteArrayInputStream, the data¯ow analysis ªmergesº the two types and records the type Input-Stream instead. If there are no common supertypes for the possible types in a certain location, then the type unusable is used, indicating that the value cannot be used by the following instructions. This generalization of types does imply a loss of information and precision. This is what makes the analysis conservative, in the sense that it is pessimistic. Types used in the data¯ow analysis are primitive types (single-word int or ¯oat or double-word long or double) and reference types (the types associated to references to objects or arrays). A reference type may be a class, interface or array type (which speci®es a base type and a number of dimensions). The type returnAddress will be used to describe the return address to a subroutine, as created by the jsr instruction. The special type named unusable is used to mark uninitialized registers. The special reference type null is used to represent the type of null references produced by the aconst_null instruction. Also note that implementations will generally use other special types to represent allocated but not yet initialized objects. 3. Object creation Creating a new object is done in two steps. First, space for the object is allocated through the use of the new instruction, which returns a reference that points to the newly allocated memory space. Then, the object is initialized by invoking one of its constructors (a method named kinitl). For example, the Java statement new String() is translated to the following bytecode instructions: ; allocate space for String and push ; reference to it onto the stack new java/lang/String ; duplicate top stack item (reference to ; newly allocated space) dup ; call String.String() constructor, uses ; up one of the references to newly allocated ; space as ªthisº argument. invokespecial java/lang/String/ kinitl()V ; This leaves a reference to the new ; String object on the stack. The constructor is responsible for putting the object in a valid state. Until initialization of the new object completes, its state remains unde®ned and may be inconsistent. The language semantics therefore disallows using a newly allo-cated object before it is initialized. Enforcing this is one of the veri®er's responsibilities. The veri®er must keep track of which object is initialized and which is not, ensure that proper constructors are used to initialize new objects and make sure that uninitialized objects are not used before they are initia-lized. This is one of the tricky points of the data¯ow analysis. Ref. [1] covers this aspect brie¯y. Ref. [3] presents a detailed analysis and formal speci®cation of the language rules related to object initialization. Unfortunately, neither Refs. [3] nor [1] discuss the interaction between object initialization and exception handlers. We will ®rst discuss the rules that the veri®er should enforce, and we will then consider how a veri®er implementation can enforce them. 3.1. Rules The veri®er must enforce the following properties: ² An object must not be used before it is initialized. ² An uninitialized object must be initialized by one of the constructors declared in its class. A constructor from another class cannot be used. Notice that methods named kinitl are not inherited. ² An object must not be initialized more than once. ² If an exception is thrown by the call to the instance initialization method, then the new object must not be used because its initialization is incomplete. We ®rst discuss what it means for an uninitialized object (or r

展开阅读全文