Newer
Older
<h5>Semantics:</h5>
Memory is allocated, a pointer is returned. '<tt>alloca</tt>'d memory is automatically released when the method returns. The '<tt>alloca</tt>' utility is how variable spills shall be implemented.<p>
<h5>Example:</h5>
<pre>
%ptr = alloca int <i>; yields {int*}:ptr</i>
%ptr = alloca [int], uint 4 <i>; yields {[int]*}:ptr</i>
</pre>
<!-- _______________________________________________________________________ -->
</ul><a name="i_load"><h4><hr size=0>'<tt>load</tt>' Instruction</h4><ul>
<h5>Syntax:</h5>
<pre>
<result> = load <ty>* <pointer> <i>; yields {ty}:result</i>
<result> = load <ty>* <arrayptr>{, uint <idx>}+ <i>; yields {ty}:result</i>
<result> = load <ty>* <structptr>{, ubyte <idx>}+ <i>; yields field type</i>
</pre>
<h5>Overview:</h5>
The '<tt>load</tt>' instruction is used to read from memory.<p>
<h5>Arguments:</h5>
There are three forms of the '<tt>load</tt>' instruction: one for reading from a general pointer, one for reading from a pointer to an array, and one for reading from a pointer to a structure.<p>
In the first form, '<tt><ty></tt>' must be a pointer to a simple type (a primitive type or another pointer).<p>
In the second form, '<tt><ty></tt>' must be a pointer to an array, and a list of one or more indices is provided as indexes into the (possibly multidimensional) array. No bounds checking is performed on array reads.<p>
In the third form, the pointer must point to a (possibly nested) structure. There shall be one ubyte argument for each level of dereferencing involved.<p>
<h5>Semantics:</h5>
...
<h5>Examples:</h5>
<pre>
%ptr = <a href="#i_alloca">alloca</a> int <i>; yields {int*}:ptr</i>
<a href="#i_store">store</a> int 3, int* %ptr <i>; yields {void}</i>
%val = load int* %ptr <i>; yields {int}:val = int 3</i>
%array = <a href="#i_malloc">malloc</a> [4 x ubyte] <i>; yields {[4 x ubyte]*}:array</i>
<a href="#i_store">store</a> ubyte 124, [4 x ubyte]* %array, uint 4
%val = load [4 x ubyte]* %array, uint 4 <i>; yields {ubyte}:val = ubyte 124</i>
%val = load {{int, float}}* %stptr, 0, 1 <i>; yields {float}:val</i>
</pre>
<!-- _______________________________________________________________________ -->
</ul><a name="i_store"><h4><hr size=0>'<tt>store</tt>' Instruction</h4><ul>
<h5>Syntax:</h5>
<pre>
store <ty> <value>, <ty>* <pointer> <i>; yields {void}</i>
store <ty> <value>, <ty>* <arrayptr>{, uint <idx>}+ <i>; yields {void}</i>
store <ty> <value>, <ty>* <structptr>{, ubyte <idx>}+ <i>; yields {void}e</i>
</pre>
<h5>Overview:</h5>
The '<tt>store</tt>' instruction is used to write to memory.<p>
<h5>Arguments:</h5>
There are three forms of the '<tt>store</tt>' instruction: one for writing through a general pointer, one for writing through a pointer to a (possibly multidimensional) array, and one for writing to an element of a (potentially nested) structure.<p>
The semantics of this instruction closely match that of the <a href="#i_load">load</a> instruction, except that memory is written to, not read from.
<h5>Semantics:</h5>
...
<h5>Example:</h5>
<pre>
%ptr = <a href="#i_alloca">alloca</a> int <i>; yields {int*}:ptr</i>
<a href="#i_store">store</a> int 3, int* %ptr <i>; yields {void}</i>
%val = load int* %ptr <i>; yields {int}:val = int 3</i>
%array = <a href="#i_malloc">malloc</a> [4 x ubyte] <i>; yields {[4 x ubyte]*}:array</i>
<a href="#i_store">store</a> ubyte 124, [4 x ubyte]* %array, uint 4
%val = load [4 x ubyte]* %array, uint 4 <i>; yields {ubyte}:val = ubyte 124</i>
%val = load {{int, float}}* %stptr, 0, 1 <i>; yields {float}:val</i>
</pre>
<!-- _______________________________________________________________________ -->
</ul><a name="i_getelementptr"><h4><hr size=0>'<tt>getelementptr</tt>' Instruction</h4><ul>
<result> = getelementptr <ty>* <arrayptr>{, uint <idx>}+ <i>; yields {ty*}:result</i>
<result> = getelementptr <ty>* <structptr>{, ubyte <idx>}+ <i>; yields field type*</i>
'<tt>getelementptr</tt>' performs all of the same work that a '<tt><a href="#i_load">load</a>' instruction does, except for the actual memory fetch. Instead, '<tt>getelementpr</tt>' simply performs the addressing arithmetic to get to the element in question, and returns it. This is useful for indexing into a bimodal structure.
<h5>Arguments:</h5>
<h5>Semantics:</h5>
<h5>Example:</h5>
<pre>
%aptr = getelementptr {int, [12 x ubyte]}* %sptr, 1 <i>; yields {[12 x ubyte]*}:aptr</i>
%ub = load [12x ubyte]* %aptr, 4 <i>;yields {ubyte}:ub</i>
</pre>
<!-- ======================================================================= -->
</ul><table width="100%" bgcolor="#441188" border=0 cellpadding=4 cellspacing=0><tr><td> </td><td width="100%"> <font color="#EEEEFF" face="Georgia,Palatino"><b>
<a name="otherops">Other Operations
</b></font></td></tr></table><ul>
The instructions in this catagory are the "miscellaneous" functions, that defy better classification.<p>
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
<!-- _______________________________________________________________________ -->
</ul><a name="i_cast"><h4><hr size=0>'<tt>cast .. to</tt>' Instruction</h4><ul>
<h1>TODO</h1>
<a name="logical_integrals">
Talk about what is considered true or false for integrals.
<h5>Syntax:</h5>
<pre>
</pre>
<h5>Overview:</h5>
<h5>Arguments:</h5>
<h5>Semantics:</h5>
<h5>Example:</h5>
<pre>
</pre>
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
<!-- _______________________________________________________________________ -->
</ul><a name="i_call"><h4><hr size=0>'<tt>call</tt>' Instruction</h4><ul>
<h5>Syntax:</h5>
<pre>
</pre>
<h5>Overview:</h5>
<h5>Arguments:</h5>
<h5>Semantics:</h5>
<h5>Example:</h5>
<pre>
%retval = call int %test(int %argc)
</pre>
<!-- _______________________________________________________________________ --></ul><a name="i_icall"><h3><hr size=0>'<tt>icall</tt>' Instruction</h3><ul>
Indirect calls are desperately needed to implement virtual function tables (C++, java) and function pointers (C, C++, ...).<p>
A new instruction <tt>icall</tt> or similar should be introduced to represent an indirect call.<p>
Example:
<pre>
%retval = icall int %funcptr(int %arg1) <i>; yields {int}:%retval</i>
</pre>
<!-- _______________________________________________________________________ -->
</ul><a name="i_phi"><h4><hr size=0>'<tt>phi</tt>' Instruction</h4><ul>
<h5>Syntax:</h5>
<pre>
</pre>
<h5>Overview:</h5>
<h5>Arguments:</h5>
<h5>Semantics:</h5>
<h5>Example:</h5>
<pre>
</pre>
<!-- ======================================================================= -->
</ul><table width="100%" bgcolor="#441188" border=0 cellpadding=4 cellspacing=0><tr><td> </td><td width="100%"> <font color="#EEEEFF" face="Georgia,Palatino"><b>
<a name="builtinfunc">Builtin Functions
</b></font></td></tr></table><ul>
<b>Notice:</b> Preliminary idea!<p>
Builtin functions are very similar to normal functions, except they are defined by the implementation. Invocations of these functions are very similar to method invocations, except that the syntax is a little less verbose.<p>
Builtin functions are useful to implement semi-high level ideas like a '<tt>min</tt>' or '<tt>max</tt>' operation that can have important properties when doing program analysis. For example:
<ul>
<li>Some optimizations can make use of identities defined over the functions,
for example a parrallelizing compiler could make use of '<tt>min</tt>'
identities to parrellelize a loop.
<li>Builtin functions would have polymorphic types, where normal method calls
may only have a single type.
<li>Builtin functions would be known to not have side effects, simplifying
analysis over straight method calls.
<li>The syntax of the builtin are cleaner than the syntax of the
'<a href="#i_call"><tt>call</tt></a>' instruction (very minor point).
</ul>
Because these invocations are explicit in the representation, the runtime can choose to implement these builtin functions any way that they want, including:
<ul>
<li>Inlining the code directly into the invocation
<li>Implementing the functions in some sort of Runtime class, convert invocation
to a standard method call.
<li>Implementing the functions in some sort of Runtime class, and perform
standard inlining optimizations on it.
</ul>
Note that these builtins do not use quoted identifiers: the name of the builtin effectively becomes an identifier in the language.<p>
Example:
<pre>
; Example of a normal method call
%maximum = call int %maximum(int %arg1, int %arg2) <i>; yields {int}:%maximum</i>
; Examples of potential builtin functions
%max = max(int %arg1, int %arg2) <i>; yields {int}:%max</i>
%min = min(int %arg1, int %arg2) <i>; yields {int}:%min</i>
%sin = sin(double %arg) <i>; yields {double}:%sin</i>
%cos = cos(double %arg) <i>; yields {double}:%cos</i>
; Show that builtin's are polymorphic, like instructions
%max = max(float %arg1, float %arg2) <i>; yields {float}:%max</i>
%cos = cos(float %arg) <i>; yields {float}:%cos</i>
</pre>
The '<tt>maximum</tt>' vs '<tt>max</tt>' example illustrates the difference in calling semantics between a '<a href="#i_call"><tt>call</tt></a>' instruction and a builtin function invocation. Notice that the '<tt>maximum</tt>' example assumes that the method is defined local to the caller.<p>
<!-- *********************************************************************** -->
</ul><table width="100%" bgcolor="#330077" border=0 cellpadding=4 cellspacing=0><tr><td align=center><font color="#EEEEFF" size=+2 face="Georgia,Palatino"><b>
<a name="todo">TODO List
</b></font></td></tr></table><ul>
<!-- *********************************************************************** -->
This list of random topics includes things that will <b>need</b> to be addressed before the llvm may be used to implement a java like langauge. Right now, it is pretty much useless for any language, given to unavailable of structure types<p>
<!-- _______________________________________________________________________ -->
</ul><a name="synchronization"><h3><hr size=0>Synchronization Instructions</h3><ul>
We will need some type of synchronization instructions to be able to implement stuff in Java well. The way I currently envision doing this is to introduce a '<tt>lock</tt>' type, and then add two (builtin or instructions) operations to lock and unlock the lock.<p>
<!-- *********************************************************************** -->
</ul><table width="100%" bgcolor="#330077" border=0 cellpadding=4 cellspacing=0><tr><td align=center><font color="#EEEEFF" size=+2 face="Georgia,Palatino"><b>
<a name="extensions">Possible Extensions
</b></font></td></tr></table><ul>
<!-- *********************************************************************** -->
These extensions are distinct from the TODO list, as they are mostly "interesting" ideas that could be implemented in the future by someone so motivated. They are not directly required to get <a href="#rw_java">Java</a> like languages working.<p>
<!-- _______________________________________________________________________ -->
</ul><a name="i_tailcall"><h3><hr size=0>'<tt>tailcall</tt>' Instruction</h3><ul>
This could be useful. Who knows. '.net' does it, but is the optimization really worth the extra hassle? Using strong typing would make this trivial to implement and a runtime could always callback to using downconverting this to a normal '<a href="#i_call"><tt>call</tt></a>' instruction.<p>
<!-- _______________________________________________________________________ -->
</ul><a name="globalvars"><h3><hr size=0>Global Variables</h3><ul>
In order to represent programs written in languages like C, we need to be able to support variables at the module (global) scope. Perhaps they should be written outside of the module definition even. Maybe global functions should be handled like this as well.<p>
<!-- _______________________________________________________________________ -->
</ul><a name="explicitparrellelism"><h3><hr size=0>Explicit Parrellelism</h3><ul>
With the rise of massively parrellel architectures (like <a href="#rw_ia64">the IA64 architecture</a>, multithreaded CPU cores, and SIMD data sets) it is becoming increasingly more important to extract all of the ILP from a code stream possible. It would be interesting to research encoding methods that can explicitly represent this. One straightforward way to do this would be to introduce a "stop" instruction that is equilivent to the IA64 stop bit.<p>
<!-- *********************************************************************** -->
</ul><table width="100%" bgcolor="#330077" border=0 cellpadding=4 cellspacing=0><tr><td align=center><font color="#EEEEFF" size=+2 face="Georgia,Palatino"><b>
<a name="related">Related Work
</b></font></td></tr></table><ul>
<!-- *********************************************************************** -->
Codesigned virtual machines.<p>
<dl>
<a name="rw_safetsa">
<dt>SafeTSA
<DD>Description here<p>
<a name="rw_java">
<dt><a href="http://www.javasoft.com">Java</a>
<DD>Desciption here<p>
<a name="rw_net">
<dt><a href="http://www.microsoft.com/net">Microsoft .net</a>
<DD>Desciption here<p>
<a name="rw_gccrtl">
<dt><a href="http://www.math.umn.edu/systems_guide/gcc-2.95.1/gcc_15.html">GNU RTL Intermediate Representation</a>
<DD>Desciption here<p>
<a name="rw_ia64">
<dt><a href="http://developer.intel.com/design/ia-64/index.htm">IA64 Architecture & Instruction Set</a>
<DD>Desciption here<p>
<a name="rw_mmix">
<dt><a href="http://www-cs-faculty.stanford.edu/~knuth/mmix-news.html">MMIX Instruction Set</a>
<DD>Desciption here<p>
<a name="rw_stroustrup">
<dt><a href="http://www.research.att.com/~bs/devXinterview.html">"Interview With Bjarne Stroustrup"</a>
<DD>This interview influenced the design and thought process behind LLVM in several ways, most notably the way that derived types are written in text format. See the question that starts with "you defined the C declarator syntax as an experiment that failed".<p>
</dl>
<!-- _______________________________________________________________________ -->
</ul><a name="rw_vectorization"><h3><hr size=0>Vectorized Architectures</h3><ul>
<dl>
<a name="rw_intel_simd">
<dt>Intel MMX, MMX2, SSE, SSE2
<DD>Description here<p>
<a name="rw_amd_simd">
<dt><a href="http://www.nondot.org/~sabre/os/H1ChipFeatures/3DNow!TechnologyManual.pdf">AMD 3Dnow!, 3Dnow! 2</a>
<DD>Desciption here<p>
<a name="rw_sun_simd">
<dt><a href="http://www.nondot.org/~sabre/os/H1ChipFeatures/VISInstructionSetUsersManual.pdf">Sun VIS ISA</a>
<DD>Desciption here<p>
</dl>
more...
<!-- *********************************************************************** -->
</ul>
<!-- *********************************************************************** -->
<hr>
<font size=-1>
<address><a href="mailto:sabre@nondot.org">Chris Lattner</a></address>
<!-- Created: Tue Jan 23 15:19:28 CST 2001 -->
<!-- hhmts start -->
Last modified: Sun Jul 8 19:25:56 CDT 2001