ep.lgb.hu :: Enterprise-128 EXOS file analyzer, lister and disassembler

This software can be used to analyze, list and disassemble Enterprise-128 EXOS files (or EXOS_ROM / raw binary files/images) on a PC. This means IS-BASIC, IS-FORTH, WP and ML programs.
It includes the character set conversion to UTF-8, the possibility to turn on "hex debug", and "info mode" and to choose between text or html output.
The script itself can also dump its internal "database" (conversion DB) so you can examine basic tokens and the character set(s) used on the Enterprise and other internal informations (Z80 opcode table, EXOS functions, EXOS file types, I/O ports, etc).
The name "EPBAS" came from the original functionality: list IS-BASIC programs, and nothing more.

WARNING! The documentation currently is a big heap of mess. It should be really re-organized :)

INTRODUCTION

This software can be used to analyze, display and disassemble Enterprise-64 and 128 EXOS structured files / ROM images / programs on PC (or on the web, see later).

This document can be found (probably more updated version, you should read, if you are not there now ...): epbas.lgb.hu/readme.html

Project page: epbas.lgb.hu/

(The name "EPBAS" is from the original functionality: to list IS-BASIC programs, and nothing more. Currenty the functionality of the project is much wider than that.)

Currently, IS-FORTH (type-1), IS-BASIC programs (type-4), editor documents (type-8, ie: WP files), ML user programs (type-5) and raw binary images are supported (together with EXOS_ROM images). With ML/raw mode, disassembly can be requested as well.

The loaded file is parsed for multiple headers, and as EXOS, the final type-10 header will cause the converter to stop. The converter also takes care to create valid UTF-8 output, which also means that you may need to specify the EP character set used. The output can be plain TEXT or HTML, with/without "info mode" and "debug hex mode".

For more information about the development please check out the section "CHANGELOG" at the end of this file.

Additionally, the converter can be used to create "nice" HTML page on the known character sets, Z80 opcodes, etc. It helps to fix bugs :) Also the internal DBs used to describe various ops can be dumped.

Please note, that this file is a very simply re-formatted variant into HTML from the README file (text only) can be found in the downloadable version as well.

COPYRIGHT

This program can be used/distributed/modified according to the GNU/GPL v3 or later. Visit page www.gnu.org/licenses/gpl.html for more information. In nutshell (described in a very amateurish and not so correct way) you can say that you can use/redistribute this program without restrictions other than changing the copyright / license. Also you can modify it or you can even create other projects using this work if your work is also covered by this license with providing the source freely available.

Any - constructive - feedback/help is welcome.

RECOMMENDED INFORMATION / GREETINGS

Special thanks to the Hungarian Enterprise Forever forum, especially EP-Jedi Master Zozo. :-)

TRY IT ONLINE

If you don't want to install this software (or its requirements, see below), you can even try it out online: epbas.lgb.hu/tryit/

INSTALLATION

If you choose to install it: epbas.lgb.hu/epbas.zip

Note: with downloading the software, you will get the "off-line" version, it's a command line controlled stuff. The on-line mode uses a little wrapper which is _not_ the part of the downloadable version but basically it merely calls the very same software anyway.

The converter is written in Python. On an average Linux (and maybe UNIX) system, it should be installed, so you can directly run the script (epbas.py) as an "executable" (you may need to give executable permission to the file though).

Please note that there are multiple .py files belongs to this program, you need all of them!

On Windows, I haven't got too much idea, as I never have had Windows. For sure, Python runs on Windows, but it's up to you to figure out how to do it (www.python.org). As far as I know, it's possible on Windows to "assign" *.py extension to the python.exe interpreter somehow, in that case you can "launch" the .py file "directly".

Note about Python: Python2 and Python3 are quite different "beasts". :) Though I test my converter with both of the major versions, I mainly use Python2. Also, it's important to note, that older versions within the v2 branch may don't know about newer constructs I use (like bytearray). So in nutshell: I would recommend to use Python 2.7 versions, it should work. Earlier or later versions can cause problems.

USAGE, COMMAND LINE

Without any parameter, you'll get a summary on the syntax. Note, that some not-yet-finished switches are also shown which won't work.

There are some parameters which can be used only WITHTOUT any other file names/switches with special modes. They are:

-db: This will dumps the internal "databases" of BASIC tokens, character sets, and EXOS header types known to this program as simpel text file onto STDOUT.
-chset: This will dumps the known character sets as a "nice" HTML page onto STDOUT.
-z80ops: This will dumps the opcode matrix for Z80 as a "nice" HTML page onto STDOUT. FD prefixed opcodes are missing because it's the same as DD, only IX<->IY change is needed for sure.
-version: This will print the version number.

Besides these, the normal usage of the converter requires zero or more switches and an input file name (with or without path). Note: the LAST parameter of the command line is ALWAYS treated as the input file name/path! Before the file name, these switches can be used:

-out=FILENAME: This will cause to create the output in file FILENAME, it can also contain path. Without this switch, the output is printed on STDOUT.
-hex: Debug mode: do hex dumps while parsing file.
-info: Info mode: prints information on EXOS headers, and used parameters of the converter. It will also show the last "end of file" header (type-10). Other information on specific types may also cause to be more verbose if you use this switch. It's recommended to use -info unless if you're only interested in the clean listing and nothing more.
-cset=UK: Sets the EP character set. UK is the default so it's useless with UK, but you can specify others. You can use the special -db mode (see above) to get some idea about known EP character sets. You can also use the -chset special mode to create detailed character set map in the form of a HTML file.
-html: Produces HTML output instead of TEXT. Without this switch given, the output will be TEXT, which is quite confusing especially if you use -hex and/or -info mode. In HTML mode some kind of syntax-highlighting is used as well. HTML mode is recommended if it's acceptable for you, because of the help given by syntax highlighting and also the ability to follow code with the link/anchor mode. Of course a HTML page needs a web browser, so if you need a text file to be further processed, you may not want to use it anyway.
-bin=LOAD,START: Treat input file as BINARY file, not EXOS! The LOAD and START are un- pre/postfixed hex (!) numbers to specify load and start addresses. Converter will create hex dump on the file _OR_ it tries to disassemble it, if -dasm is also given (START address is only used then). Can be used to eg analyze ROM image. You can even give more hex numbers (again with separated by comma) to give more hints for the disassembler for code analyzation starting points (this does not make any sense unless you use -dasm switch too). Read section BINARY INPUT MODE and DISASSEMBLER to learn more on this topic.
-nolinks: Only meaningful with -html mode. By default, HTML mode uses anchors/links to allow to follow the program/data structure where it's available (for more information see HTML OUTPUT section of this document). With this switch you can disable it.
-dasm: If converter founds machine code program (ie: type-5 or using -bin switch) it will dumps it via "standard" hex dump format (even without -hex given!). If you specify this option as well, converter tries to disassemble instead. It's a quite complex topic, so read the "DISASSEMBLER" section about the details. Please note that this is considered as a HIGHTLY EXPERIMENTAL feature, it MAY EVEN NOT WORK AT ALL. It can be also used with the -bin switch. DOES NOT WORK YET: you can specify a file after -dasm (ie: -dasm=source.asm) which should be the source from the previous run (for the same file!) but with user modifications. This will be processed to give "hints" and also to include user comments.
-savepic=filename: If a file contains image which is supported, it will be converted into a GIF image and shown in -html mode (not in text!). With this option, you can require to save the file instead of just displying it. If you don't specify a parameter for -savepic (without the =filename part), the file tried to be displayed interactivly in a window, but this is Python PIL stuff, and may work only on Linux, I have no idea.

Most switches can be used together without problem, however there are some exceptions. Program will warn you if this is the case so you don't need to worry about this issue.

IMAGE OUTPUT

If a format contains a viewable image, it can be displayed, but only in html mode (see below, section "HTML OUTPUT"). However it's possible to save the output as a file, with the -savepic=filename switch, see above. In text mode (without -html mode), only a text is displayed that image cannot be rendered.

HTML OUTPUT

HTML output mode can be requested with the -html switch.

One advantage of HTML mode is using syntax highlighting. Another one is using anchor/link scheme, that is: you can click on branches (GOTO numbers etc) to follow the execution of the program. It can help to understand complex program.

This link/anchor mode can be disabled with the -nolinks switch.

With link/anchor not disabled, HTML anchors are generated prefixed with L0_ and similar prefixes. The number is the EXOS header "number" (eg: first header is 0, second is 1). It's needed to handle the situation to have more "items" in one file, so links won't be conflicting between modules.

GUESSED INPUT MODE

This is the default mode, unless binary mode (-bin switch, see later) is specified.

In this mode the input file is treated as an EXOS file, unless EXOS_ROM string is found, in this case it's treated as an EXOS ROM as a whole. If the input file is not EXOS_ROM and it seems not have a valid EXOS header either, the result is an error.

BINARY INPUT MODE

With the -bin switch, you can specify binary input. In this case, no EXOS header is examined, and the input file is treated as a block of raw ML program. The output of the converter in this case will be similar as with EXOS type=5 files: hex dump on the file, or disassembled source, if -dasm is specified as well.

Note, that in theory you can use -bin switch on an EXOS type=5 file too. This is possible, as type=5 EXOS file is simply the EXOS header than the program stream, and no ending header (no type=10 header at the end). Thus you can disassemble a type=5 EXOS file in binary input mode as well, with the following switches:

-bin F0,100 -dasm

The 0xF0 as load addresses in needed because EXOS header is there and the "real" program should be put on 0x100 (and EXOS header is 16 bytes long: 0xF0 + 16 = 0x100).

One of the advantage of this trick is to be able to used the binary code point sync hinting mode, which is not available in the default EXOS input mode. Binary input mode can be also useful to make hex dump of an unknown-formatted EP file (no -dasm is needed then of course, and probably -bin 0,0 is a good idea, even if you know that "starting address" won't be used too much with hex dump!). One disadvantage to use -bin with files having EXOS headers, that disassembler won't parse EXOS header, and won't place the length information there by labels. However it's not a big price, and you can modify the source to do that so, if you really want to re-assemble the source then.

Note, that disassembler (see the DISASSEMBLER section) does not know about internal memory layout changings, so it's better to try to disassembly smaller parts at once, like with page 0 of EXOS. As it's mapped from C000 after startup (which is not handled by the disassembler) it's better to specify for the converter that address already. To really try that, you should have a file containing only page 0 of EXOS. With using UNIX (eg Linux) system, it's quite easy:

dd if=name_of_the_original_image of=exos-page0.rom bs=16384 count=1

Of course you should modify the name after if= :) You'll get your page 0 with the name after of= with this command.

Note, that you can give more hex numbers with -bin, but two are compulsory, as we know: first is the load address, second is the start address (program entry point). Without -dasm the start address is quite meaningless but anyway you must specify it, even if you want only a "nice" hex dump of the file. If you specify more hex numbers, you can hint the code analyzator of the disassembler as code entry points. To learn about more on this, read the section DISASSEMBLER.

In general: using binary input mode, you can improve the quality of the result from the disassembler if you specify more addressses, especially if you see the problem as some parts of the code is not recognized as code, but dumped as data instead.

IS-BASIC

The original target of the project, thus the name of the "EPBAS". IS-BASIC programs are dumped however currently there is no perfect match with "real" printing (on the EP) as spaces used are different. It's on my to-do list.

"Multiple" BASIC programs type is not supported as I don't know what the hell they are, and also I haven't got any example to work with :(

IS-FORTH

IS-FORTH mode is currently experimental. Also, this was not tested too much yet.

If you examine an IS-FORTH program with syntax highlithing more closely, you will notice that almost everything is "red". That colour is used to show words which are part of the VLIST. It's not a mistake that eg number of 3 is red, but eg 99 is blue (numeric constant). It's because 3 is defined as FORTH word ... The standard words defined were extracted from IS-FORTH directly. Defined (or redefined) words will be also red, they are tracked. Do not be surprised as in forth almost everything is just a defined word.

Also, encoding strings seems to be "odd" at first, ie string needs a space at the beginning which is not part of the string. This is also not a mistake, it's because quotion mark is a FORTH word, and you need a separator so FORTH can recognize.

ABSOLUTE SYSTEM EXTENSIONS

Absolute system extensions type (type code 6) is handled, however the current support is very same as with "new application program" (type 5) only the load/start addresss is different (0xC00A instead of 0x0100). The very same rules apply, ie hex dump without -dasm, and disassembly mode when used with that switch.

DISASSEMBLER

The disassembler tries to be intelingent, it's an iterating, two pass disassembler written by me. In the first pass, code flow is tried to follow, with doing iteration at every point where program flow can result in multiple choices for the next PC value, ie conditional jumps or RETs. The result in the first pass is stored only as "code hints" which created a map in the in-core memory image about memory locations containing opcodes. All other memory locations are treated as data then. In the second pass the actual output is generated by walking though the code hint points array. In case of hit, the actual opcode will be disassembled. Other locations are presented with data declarations. Another feature is the data hint array. On each opcode which would read/write memory locations, data hint is filled for that address. On data mode dump, more data is tried to dump in one line if there is no more data hint hit inside it. If data is detected to be STD ASCII, ASCII mode dump is done.

The current hack is the ability to try to disassemble sections which cannot be reached by constant jumps/calls. This is called "fallback mode". It's not an ideal solution, as these sections can be data rather than code. To try to minimalize these cases, a linear part of the code is assigned as data as soon as a data label found referenced to that area. Of course it's also not perfect.

If HTML mode (-html) is requested, anchors/links are used as with BASIC, so eg JPs can be followed by clicking on the addresses as well.

Please note that there is a major problem with ASCII mode data dump: The purpose of the whole converter project is having a clean, UTF-8 representation of EP encoding. However the purpose of the disassembler is to create source which can be assembled by JSASM, which would not tolerate UTF-8 sequences too well (as it does not know what kind of EP chars should be generated then). For this, the converter analyzes the text conversion tables by inspecting the used EP charset (-cset=...): only bytes are treated as "character data" where UNICODE position is the same as the EP-ASCII code for the given character. Also, the disassembler emits "ASCII" data parts if at least 5 characters of continous data is found, where every byte is in the interval of valid ASCII codes having the very same ASCII code and unicode position.

Since disassembler is picky to treat something as code, if it can reach via following the code, it can fail to disassemble non-reachable parts, or code paths can be reached only by register-jump, jumping table etc, which can't be discovered by simple static code analyzation. To help the disassembler you can give manual "code hinting points" (BUT only if you use the -bin switch!). To learn about this topic more, please check out the reference of the switches and the BINARY INPUT MODE section.

Technical notes about the disassembler:

It does handle not-so-standard DD, FD prefix sequences. Z80 for real accepts multiple DD/FD prefixes without problem, though it does not have too much sense. It's a different behavious as with CB/ED which is always followed by the actual opcode what should be interpreted with CB/ED together.
Disassembler can't be smart enough to understand memory layout changes, eg memory mapping by the Dave registers on EP, or even by "manual relocation" of the code to another memory range. It's almost impossible to do, maybe emulating a Z80 and actually running the code can help to analyze situations like this, but it's fairly out of scope of this rather simple project. This also means that another EP segments mapped in (system segment for example) is not so handled well, from the view point of the disassembler it can be memory reference for non-initialized data.
Also, if code jumps by register value, it's cannot be analyzed as static code analyzator is not enough here.
The "compile-back" test (ie disassemble executable by my converter then try to assembly it with JSASM and check if the result is the very same binary as the original) may fail because some there are more Z80 opcodes having the same meaning and asm token (eg undocumented version of documented opcodes). Also, I had no willpower to check out this very carefully, eg undocumented opcodes may be not handled by JSASM either (or having another name for those opcodes).
Because one purpose is the ability of re-assembling the disassembled source by JSASM, EXOS header is also written in the generated output, even if it's already described in the "info boxes" which can be requested by the -info switch.
Disassembler always creates a macro called "EXOS" which is the usual EXOS call sequence with the function code stored as a byte after RST.
Disassembler will surely goes crazy if self modification is done, as code and data hint discovery would conflict in that case. Currently code hint will win though.
AGAIN: this is my first try to write a Z80 disassembler! Also not so serious testing is done. It's also possible that the disassembler part raises an exception which results the converter to die with an error message. You're warmly welcome to send me patches though, as always (or at least bug reports).
In the future I may implement a "manual hinting" mechanism where you can control to behaviour of the assembler manually (eg some memory locations can be named with meaningful content, etc). It's a quite complex project, but in nutshell, I can imagine that the generated asm file can be edited (by hand) which is re-read by the disassembler to use the edited info, so in the next pass the result will be better.
The link/anchor stuff can failed sometimes, eg having a link there is not anchor for. Anyway, it's "only" confusing but not a fatal issue.

Z80 ASSEMBLY SYNTAX ISSUES

Still about the Z80 disassembler. The "standard" Z80 assembly syntax is somewhat broken sometimes. For example check out this:

JP (HL)

For real there is NOT instruction like this! (HL) would mean normally that CPU should read the byte (or maybe word in this case ...) at address in HL, and use that address to jump. However this is not true at all, as this op is simply jumps to address in HL, so PC := HL.

So for real, the legal op should be written as:

JP HL

And yes, SJasm even support this, surprise :) Maybe it was a mistake that JP (HL) format is used so widely ...

Another fact which shows that JP HL is the correct form is the DD/FD prefix. As we know, if an instruction uses (HL), DD/FD prefix causes to use (IX+d) or (IY+d) instead. If an instruction does not use (HL) but uses HL, H or L, it's converted to IX/IY, IXH/IYH or IXL, IYL. If you find a Z80 opcode matrix and search for "DD E9" (E9 is opcode of JP HL) you will see this:

JP IX

It does mean that the original (unprefixed) E9 opcode must have been JP HL, and _not_ JP (HL), as with this case the prefixed version should be JP (IX+d) and not JP IX.

I had to mention this topic, as it was reported as a bug. It is not, the bug is people using JP (HL) format. As my disassembler is not generated table based, but parsing opcodes on byte level, the logical way - JP HL - is used. If you tried to modify this, the prefixed version would turn out to be JP (IX+d) which would be incorrect then.

Another similar anomaly is having the "ALU" group of opcodes "A" as the first parameter for some opcodes, but not for the other. From view point of the logic, it's totally meaningless, my disassembler won't generate "A" ever, so no ADD A,B but only ADD B. Yes, SJasm supports this.

There is some oddify how Z80 assembly syntax tries to explain some situations. Consider this:

LD somereg, label
LD somereg, (label)

It's clear what is the difference, but it's easy to mess this up by mistake. Some Z80 assemblers supports syntax like:

LD somereg, #data

To signal with '#' about the immediate data (anot not memory reference). However I'm using the "standard" way.

SJasm allowes syntax about [...] instead of (...) is not supported. The purpose (I guess) of SJasm's [...] is the way to avoid situations where (...) is used mathematically and not meant to sign the memory reference ... ?

NAMING THE PROJECT

I should start to thing about a better name for the project because the original intent of the converter was a simple "list EP BASIC programs as text", but now it handled multiple header types, it has some kind of intelligent disassembler, it does character set conversions, helps to create HTML references of charsets and internal program flow for both of BASIC and ML parts. Now, even IS-FORTH and WP are supported.

IDEAS AND TODO

More and more "smart" disassembler functionality to implement, with user controlled hinting (label names, "phase blocks", data/code selection, comments, etc) in a way that a newly generated disasm list contain all the user submited changes while allowing to interface with external GUI and/or web frontend easily for the user to do this. I may left this work after the version number of 1.0.

Also minor change can be the quite regular case when a program copies itself to another location with a simple LDIR opcode. As we can know (hopefully) the register values before LDIR we can "fake" the result of the operation as well!

For 1.0, I'd like to clean the code up at many places (even recontructing the whole program), fixing bugs, and introduce clean python2/python3 compatibility without current "hacks". Minor feature imporvements (as with IS-FORTH support) can come meanwhile though. The new and clean code base will be able to make it possible to introduce bigger changes than, like even more advanced disassembler features.

Short term goals:

IS-BASIC 100% correct display compared to a "real" Enteprise
TEST_ROM mode, similar to EXOS_ROM
Cleaned-up documentation
Test IS-FORTH support for program more than one buffers
Disassembler: correct track of register contents where possible
Disassembler: use tracked register values to xref EXOS call data pointers
Disassembler: append EXOS call info table with register usage

Longer term goals:

Re-structure the whole code, now it's a big mess. as the original purpose was to list IS-BASIC programs only.
UI and/or Web-UI support for hinting/disassembling to form a nice "interactive disassembler" like solution
Disassembler hinting support
Disassembler "PHASE block" support
Jump-inside an opcode problem and handling in disassembler
Handling memory references as base+offset in disassembler on request
Disassembler code/data manual choice support
Disassembler after-CALL-data support + hinting

SHORT FILE FORMAT DESCRIPTION

In my opinion, there is no "complete" documentation on files handled by EXOS. Information can be gathered from various places though, or experimenting with files & checking them in a hex editor. I try to summarize information I know. This is _FAR_ from being complete, so if you have any suggestion/help, please tell me. Zozo already was a great source :)

Enteprise's OS (EXOS) uses a well-structured and nice scheme to manage even multiple modules inside a single file. EXOS files consists of one or more modules, each has got a 16 byte long header. The first byte of the header must be zero, otherwise it's not an EXOS file. The next byte signals the type. The remaining bytes are usually zero, unless they are used by the specific type, then it depends on the specific type. After the header (except for end-of-file) data follows, it's up the specific format to tell how much bytes (also the type defines how to interpret the data bytes, of course).

Type codes:

0x00: not an EXOS file!
0x01: IS-FORTH program. See below [NOTE: some documents marks this as "unused"]
0x02: User relocatable module. Not supported.
0x03: IS-BASIC program (multiple). I don't know about this one!
0x04: IS-BASIC program (single). See below.
0x05: ML user program. See below.
0x06: Absolute system extension.
0x07: Relocatable system extension. Not supported.
0x08: WP files / saved editor buffer. See below.
0x09: LISP memory image file. Not supported.
0x0A: end-of-file
0x0B-0x1F: reserved for future usage?
0x20-0xFF: ??? unused/invalid

End-of-file header is special, it signals the end of the file, and no more data after the header. One thing I can't really understand: some of the types above seems to be "terminating" and no other headers/modules follows even without end-of-file type, some of them needs the end-of-file. It seems it depends on the behaviour of the type: some module types causes to pass the control to the handler, so there is no point to put more headers in the file, not even the end-of-file. For example this is the case with type-5.

It's important to note, that these files can contain ASCII data. In this case, the interpretation of the bytes should be in the character set map used by Enterprise, and there are even multiple - different - character sets. That's why my converter needs this information and it maps EP chars into UTF-8 sequence based on the selected charater set information. Character set tables used by my program can bee seen here: epbas.lgb.hu/result-chset.html It seems (outside of Hungary) two main tables are used: the UK and the BRD (german).

IS-FORTH programs are simple: basically they are all text. An IS-FORTH program consists one or more "buffers". The byte in the EXOS header after the type code describes the number of buffers. A buffer is always 2+1024 bytes long. The first two bytes actually form a word, which tell the buffer number "sequence". Please note, that there is no need for strict ordering, anything of the buffer numbers. The next 1024 bytes forms the buffer itself. The unused bytes at the end of the buffer are filled up with space characters (ASCII 0x20). The used area divided into lines separated by standard CRLF sequences. It seems IS-FORTH programs lack the end-of-file header. Of course character conversion applies! Decoding IS-FORTH is really easy compared to eg IS-BASIC, as the program "stream" within the buffer itself is only text. However doing syntax highlighting etc is harder, as there is structural information in case of IS-BASIC programs while it's not the case with IS-FORTH. I try to "tokenize" the buffer content using separator character like space, and filling vlist array as well, for links in HTML mode for word definition. Also comments are more-or-less recognized together with the built-in list of words.

WP documents (technically they are called "saved editor buffer documents" or such, so maybe not only WP can emit these kind of information). I don't know too much on this format other than it consists of character lines, having 3 bytes of information (as far as I can tell, pointers: editor buffer in memory are linked lists) at the beginning (so I skip them) and a trailing byte. What I do is to print a line till byte value is equal or greater than 32 (space) after the 3 bytes at the beginning, then I skip a single byte again, and finally continue with the next line. I don't say it's the correct solution :) Of course, character conversion applies!

User ML programs are "machine language" stream of binaries. They're always loaded at offset 0x100 into the memory. There is not so much structure of this file, however the EXOS header contains the length of the program (with low then high bytes of the word) after the type byte. If you want to disassemble the programs or even doing a hex dump, you must be careful with the bytes used to display/represent strings, as they are subject of character converion. However unlike interpreted and "structured" files (like IS-BASIC) you can't be sure which bytes are data and which is not, of course ...

IS-BASIC programs are "complicated" because they use tokenization, custom number representation, etc (but note: it's said IS-BASIC can load - and also save - programs as a pure text files too). An IS-BASIC program consist of lines. A line begins with a single byte telling the length of the line. If it is zero, it signals the end of the program, and you should stop parsing there. The next two bytes are representation of the line number (standard low then high byte). End of the line is signaled by a zero byte then (though it's redundant in my opinion as length of the line shows the line length anyway). The line itself consist of "marker" (note, it's only my name for these entities) bytes and possible other information after the specific markers. If marker byte is below 0x20 then the "special sign" table is used to display some character. If the byte is below 0x60 (but not below 0x20 - of course) then the marker's lower 5 bits signals the length of a string which is decoded as a name (in EP charset) after the marker. Byte 0x80 shows a BASIC string, the next byte is the length, then that number of bytes follows. The string must be decoded (in EP charset) surrounded by quotion marks. Marker byte 0x60 shows a tokenized BASIC keyword. The next byte is an index within the token table. Some of the BASIC keywords ("untok_left") are special in a way, that rest of the line should be printed as-is (with EP charset conversion though, of course) no marker byte interpretion, etc. Marker bytes 0xA2 and 0xC2 means a two byte integer constants followed by the marker byte as low then high bytes. There are two different markers as 0xA2 is used as BASIC line reference (GOTO, etc). 0xC6 signals a float number. Unlike other BASIC dialects IS-BASIC does not use "standard" floating point math but a (packed) BCD encoded scheme. This reduces the space of number space somewhat (and said to be slower), however compared to the base-2 math, it's accurate from the view point of base-10 math, humans are used to work with (eg 0.1 cannot be stored in base-2 precisely which can cause funny surprises even with simple FOR-NEXT loops). To really understand the BCD encoded floats, it's better to watch the source code, it's harder to explain than reading the code. The special sign and the basic token table can be seen in the source too, or can be viewed by specifying the -db command line switch, or visit this page (which is the output of call with -db): epbas.lgb.hu/result-dist-db.txt As far as I can tell, other values of the marker byte are invalid, at least I throw an error for other values.

CHANGELOG

Version 0.5.8: Some restructure work on the image support, as there will be more formats need it. Image can be even saved with -savepic (new option!) now.
Version 0.5.7: Type-0Bh support ("VLOAD" image, but it's a bit confusing, as it seems it can mean different things)! Decoding image as inlined IMG tag in HTML mode for PIXEL gfx mode (2/4/16/256 colour images). The format does not contain the palette, so I use the default one. Many anomalies though: no support for other mode than PIXEL, python3 incompatibilies, needs PIL support for python to work (the online version should work though).
Version 0.5.6: HEASS source file support (EXOS header 0x80). Multiple BASIC exos type support, size and "program" in info mode. TODO: check/use size info in header for IS-BASIC files. self.exosheader usage instead of self.prg backreference at various places.
Version 0.5.5: Documentation is HTML based (text is generated, not the opposite - as before), and extended with format descriptions. Forth number base tracking, string highlighting is more intuitive from the point of view of FORTH. EXOS_ROM mode compatibility issue with py3k is fixed. Documentation is a mess anyway, should be cleaned up a lot! IS-FORTH buffer handling is modified: buffer "sequence" is word (not byte) (no type info, that's the high byte of the word!) and no strict sequence is needed. Various minor fixes.
Version 0.5.4: Preliminary syntax hightlighting (VLIST was exported from IS-FORTH directly), definition tracking, links&anchors in HTML mode for IS-FORTH, more information in info mode, using hex/debug mode if requested. Preliminary support for "absolute system extension". Currently it does mean to follow the very same code path as with type-5 (user app) programs, just with different start/load address (0xC00A instead of 0x0100). On-line mode is extended with the ability to turn disasm on/off. Preliminary support of EXOS_ROM images (without -bin switch!) with/without -dasm switch. Currently, only 16K images are supported.
Version 0.5.3: Python2/3 compatibility test & problems & fixes, however Python2 is still recommended over py3k! Sightly modified web page (epbas.lgb.hu) also new location. Project is somewhat rename. New page includes test result for IS-FORTH and disasm mode in the index page as well. "Serious" bug fix: wrong PC value is printed as comment on opcodes, the PC for the next opcode, not the current! Some of the "block" instructions used unofficial name (SJadm did not recognized). SLL was written as SLS (typo). HTML template fix. Online mode parameter passing via select menus. Some of the unused code has been removed. New tests has been added for my own pre-release test suite (with generating bin images, etc).
Version 0.5.2: I/O port desciptions and EXOS function table is extended with more data (though EXOS in/out parameters are missing), DB dump mode dumps EXOS functions and I/O ports as well.
Version 0.5.1: IS-FORTH EXOS type support! Currently without any syntax highlighting and/or link+anchor mode. Also, the current solution is highly untested ...
Version 0.5: Disassembler works "quite well" with multiple iterations, code/data guessing, xref tables, EXOS and I/O port description, etc, etc. Still, major works are left to have a "really good" disassembler even with user interactions. Anyway, this a good start ...
Version 0.4.2: Disassembler works on opcode level, main program can dump "nice" Z80 opcode matrix.
Version 0.4.1: Using anchors/links in HTML mode (mainly only the handler code, it must be adopted). Binary image mode. Start of writing skeleton of the code/data analyzer for the assembler.
Version 0.4: Start of disasm mode for type-5, currently only hex dump :) Handle end-of-file better either caused by type-10 header or type-5 as the last processed component.
Version 0.3: Uses incremental EXOS file/header parsing, ie the converter can decode files with more "items" within the same file (if all of the "items" are supported anyway).

... it was too early to write changelog at the beginning ...

Any help/feedback is welcome!