The HTML Beautifier (located in the HTML menu) will indent your HTML page just as code beautifiers do, except if the page contains legacy HTML, there can be problems one does not see in code beautifiers. Most of the problems are caused by inconsistent HTML syntax — some tags have closing partners, some do not. And some tags have closing partners sometimes, but not other times, both valid syntax. Example:
My paragraph<p> (legal)
<p align=left>My paragraph</p> (also legal)
Because both these forms are valid HTML, and because there are many such valid examples, there is no meaningful way to write a reliable, robust beautifier for legacy HTML, something I discovered over years of trying. But these problems are solved by converting to XHTML, the main intent of the most recent Arachnophilia versions.
The previous major version of Arachnophilia (4.0) had a much more ambitious beautifier that also tried to correct HTML syntax. Eventually, because of the inconsistencies in legacy HTML, both these features (beautify and validate) were downplayed and one was dropped entirely. In the new Arachnophilia (version 5.3 and newer), because of the consistency provided by XHTML, HTML Beautify and HTML Validate are separate functions, and both work as intended
if the page contains valid XHTML syntax.
If you want to keep your pages as they are, e.g not convert to XHTML, but want to use the advanced Arachnophilia features, just make these changes:
- Make sure that every tag in your page either has a closing partner:
<open>content</close>
or is self-closing:
<self-contained tag/>
- Examples of tags that should have closing partners, but did not in earlier versions of Arachnophilia, incude <li></li> and <option></option>.
- Examples of tags that should be made <self-closing/> include <img ... />, <input ... />, <frame .../>, <meta ... /> and about a half-dozen others. The point is there should never be a tag without <either> a </partner> or the special <self-closing syntax/>.
- If this system is followed, you can update your pages with a minimum of effort, and your pages will be more acceptable to browsers as well as Arachnophilia.
This is a bit of an editorial comment. I strongly recommend that the reader consider converting his or her pages to XHTML. This change allows much greater control over page form and content, and will prevent the eventual abandonment of Web pages that are not internally consistent.
As time has passed, I have done all in my power to automate Web page creation, and Arachnophilia represents some steps in this direction (although script writing is the primary approach to automation). In the case of legacy HTML Web pages, it is very difficult to apply automated methods because page content cannot be relied on to follow consistent rules. XHTML was created to correct the glaring deficiencies in HTML, and Arachnophiila is now oriented toward XHTML, to the degree that some of its advanced features won't work with HTML any more.
Beginning with Arachnophilia Version 5.4, Unicode is supported by way of the UTF-8 character set. File loading and saving, and clipboard operations, support Unicode characters and files. This change has been made in response to a number of inquiries about internationalization issues, and I made this change in a way that won't affect existing documents.
But the fact that Arachnophilia supports Unicode doesn't mean that typical browsers will suddenly and automatically support Unicode. A Unicode character sequence such as "Здравствуйте" may or may not appear correctly while editing in Arachnophilia, and it may or may not appear correctly in your browser or that of your visitors. All these issues depend on the availability of fonts to support the characters represented by the Unicode content.
One partial solution to compatibility issues with respect to international Unicode characters is to convert them into HTML entities using the menu item "HTML ... More Functions ... Html to Entity". For example, the small sample given above would be converted like this:
"Здравствуйте" -> "Здравствуйте"
The problem with this sort of conversion is that the result is no longer suitable for editing, a problem solved by temporarily converting back from entities to characters using "HTML ... More Functions ... Entity to HTML".
The other problem with using Unicode (UTF-8) characters is the fact that entity conversion only solves certain trivial browser compatibility issues, it doesn't get around the issue of display fonts. To display Unicode characters within a document in Arachnophilia, your system must have the correct fonts and a suitable Unicode font must be chosen using the font selector located at "Text ... Set Editor Font." To see Unicode characters in program prompts and macros, you must also select a Unicode font for the program itself, using menu item "Text ... Set Program Font" (in most cases the same font can be used for both choices). Finally, to see the intended Unicode characters in your browser, the browser also must have a suitable font installed. But the final problem with Unicode support is that visitors to your Web site also need to install appropriate fonts in order to view your Unicode content. All these problems should be considered before adding Unicode content to your Web pages.
Remember about using Unicode characters that you may have arranged to be able to see the right characters in Arachnophilia and on your own development Web browser, but this doesn't mean your visitors will be able to see the content you intend. For that, the visitor must have a Unicode-compatible browser and the correct installed fonts.
One more note about Unicode. Choosing an appropriate editing font with the dialog located at "Text ... Set Editor Font" doesn't change your document's content, it only changes how it is displayed in the editor. If while editing you see little blocks instead of characters, the document may still display correctly whern viewed with a browser. Conversely, the appearance of the correct characters in the editor doesn't assure that the resulting Web page will display correctly for a visitor to your site.
To become familiar with Unicode, a good way to start is to acquire some Unicode fonts. For experimentation I recommend Bitstream Cyberbit, a font available from multiple sources, and an attempt at a "universal" Unicode font (one that covers a lot of languages). You may eventually settle for a less ambitious font, one that supports fewer languages, but this font is a good way to try out Unicode.
- Download the font from one of the listed sources (the font should be free).
- Install the font using your operating system's font mangement tools.
- Run Arachnophilia, then select menu item "Text ... Set Editor Font". Choose "Bitstream Cyberbit" from the drop-down list.
- To allow program prompts and macros to display correctly, select menu item "Text ... Set Program Font". Again, choose "Bitstream Cyberbit" from the drop-down list.
- Test the editor's ability to manage Unicode by copying some foreign-language text into the editor:
- English: Hello
- Russian: Здравствуйте
- Japanese: こんにちは
- Chinese: 你好
- Korean 여보세요
- Arabic: مرحبا
- Hebrew: שלום
Here are some additional notes about Arachnophilia's Unicode support:
- If a file's character encoding is not known, one easy way to import it into Arachnophilia is to use the system clipboard. Simply open a blank document in Arachnophilia and paste the document content into it. This trick works because clipboards normally uses a more consistent character encoding than files do.
- A new submenu has been added to the "Text" menu category named "Character Encoding," containing some common character encodings. Clicking one of these character encodings changes the encoding used during subsequent read and write operations. The user may want to add to this list using the Arachnophilia Macro Editor.
- If your documents contain extended characters that are not Unicode (e.g. using ISO-8859-1 as one example), you may experience difficulties importing them into Arachnophilia while using the default character encoding (UTF-8). The solution is to set the character encoding using "Text ... Character Encoding" in advance of importing the files (or use the clipboard as explained above). Then it is best to convert the extended characters into HTML Entities using "HTML ... More Functions ... Char to Entity" — this conversion step also assures a high degree of browser compatibility.
- A new system command has been added that allows the user to choose a document read/write character set other than the default UTF-8. The command is [FileEncoding] and, if included in a macro, it can be used to choose from a large number of character encodings (more than offered by the provided menu selection). When invoked as [FileEncoding:ISO-8859-1], this command will change subsequent file operations to use the ISO-8859-1 chartacter set. When invoked as [FileEncoding] this command reverts to the default UTF-8 character encoding. Click here to read more about writing macros.
- In the long term it is best to choose a single document character encoding, and for a number of reasons UTF-8 is a very good choice.
- For those who would like to prepare Arachnophilia for use in non-English-language environments, remember that nearly all Arachnophilia menu and toolbar content can be translated into foreign languages using Unicode character sets, and the resulting program content can be exported as a distributable file. Such a translation activity would be carried out on Arachnophilia's macro content, as explained here.
- If you are creating content in a language that reads right-to-left, choose the "plain text" editor mode (right-click, "Change to plain display"). This mode automatically manages typing direction and has a distinctive text cursor that shows the current editing direction .
At risk of repeating myself, remember the Unicode pitfalls. If you don't see characters rendered correctly in Arachnophilia, this doesn't mean they won't be rendered correctly in a browser. Conversely, if you see characters rendered correctly in Arachnophilia, this doesn't assure they will appear correctly on your development browser or on the browsers of your site's visitors. All these outcomes depend on selecting and installing appropriate fonts.