Difference between revisions of "Silme:Tutorial:FormatParser"

From Braniecki's Wiki
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
== FPManager ==
 
== FPManager ==
  
FPManager is very simple to IOManager and serves as a controller class for FormatParsers.
+
FPManager is very similar to IOManager and serves as a controller class for FormatParsers.
  
 
<code lang="python">
 
<code lang="python">
from mozilla.l10n.object import L10nObject, L10nPackage
+
import silme.core
from mozilla.l10n.fp.FPManager import FPManager
+
import silme.format
from mozilla.l10n.fp.object.dtd import DTDFormatParser
 
  
fp = FPManager.get('dtd')
+
fp = silme.format.Manager.get('dtd')
  
 
string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n'
 
string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n'
  
l10nObject = fp.buildL10nObjectFromString(string)
+
l10nobject = fp.get_l10nobject(string)
 
</code>
 
</code>
  
In many cases you will not need to use it because it's automagically used by IOClient without you having to bother about it.
+
In many cases you will not need to use it because it's automagically covered by IOClient without you having to bother.
 
You control the list of formats the system supports by adding or removing imports.
 
You control the list of formats the system supports by adding or removing imports.
  
 
When you import a format parser, it will be used while reading L10nPackages and L10nObjects without any more work. But you can also use FormatParsers on your own if you want.
 
When you import a format parser, it will be used while reading L10nPackages and L10nObjects without any more work. But you can also use FormatParsers on your own if you want.
 +
 +
In a similar way to IOManager, FPManager can also guess which format to return depending on the path.
 +
 +
<code lang="python">
 +
import silme.core
 +
import silme.format
 +
 +
silme.format.Manager.register('dtd', 'gettext', 'properties', 'xliff')
 +
 +
fp = silme.format.Manager.get(path='./test/file.po') # will return GetTextFormatParser
 +
 +
</code>
  
 
== FormatParser ==
 
== FormatParser ==
Line 35: Line 46:
  
 
<code lang="python">
 
<code lang="python">
from mozilla.l10n.object import L10nObject, L10nPackage
+
import silme.core
from mozilla.l10n.fp.FPManager import FPManager
+
import silme.format
from mozilla.l10n.fp.object.dtd import DTDFormatParser
 
  
fp = FPManager.get('dtd')
+
fp = silme.format.Manager.get('dtd')
  
 
string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n'
 
string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n'
  
l10nObject = fp.buildL10nObjectFromString(string)
+
l10nobject = fp.get_l10nobject(string)
  
entityList = fp.buildEntityListFromString(string)
+
entitylist = fp.get_entitylist(string)
  
 
</code>
 
</code>
Line 58: Line 68:
  
 
<code lang="python">
 
<code lang="python">
from mozilla.l10n.object import L10nObject, L10nPackage
+
import silme.core
from mozilla.l10n.fp.FPManager import FPManager
+
import silme.format
from mozilla.l10n.fp.object.dtd import DTDFormatParser
 
from mozilla.l10n.fp.object.gettext import PoFormatParser
 
  
fp = FPManager.get('dtd')
+
silme.format.Manager.register('dtd', 'po')
fp2 = FPManager.get('po')
+
 
 +
fp = silme.format.Manager.get('dtd')
 +
fp2 = silme.format.Manager.get('po')
 
string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n'
 
string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n'
l10nObject = fp.buildL10nObjectFromString(string)
+
l10nobject = fp.get_l10nobject(string)
entityList = fp.buildEntityListFromString(string)
+
entitylist = fp.get_entitylist(string)
  
string2 = fp2.dumpL10nObjectToString(l10nObject) # it will create a very simple PO object with the data from L10nObject
+
string2 = fp2.write_l10nobject(l10nobject) # it will create a very simple PO object with the data from L10nObject
  
string3 = fp.dumpEntityListToString(entityList) # it will create a simple DTD string without any data beside of generic entities from the list
+
string3 = fp.write_entitylist(entitylist) # it will create a simple DTD string without any data beside of generic entities from the list
 
</code>
 
</code>
 +
 +
== Standalone parsers ==
 +
 +
In some cases you may want to use a format without silme. Maybe you have your own applications or your custom needs and could use a good DTD parser or Gettext serializer but you don't want to use the whole system of Managers/Clients.
 +
 +
Each Format is a package made of:
 +
 +
* silme.format.%name%.Parser
 +
* silme.format.%name%.Serializer
 +
* silme.format.%name%.Structure
  
  
== TextFormatParser ==
+
You can separate this and use independently from the rest of Silme.
  
Interesting case is a TextFormatParser. It's not a full parser because it cannot be used to read data but you can use it to easily write the content of L10nObject, L10nPackage, EntityList as a text string.
+
In [[Silme:Tutorial:Diff|the next chapter]] we'll talk about methods to find out what are the differences between two entity lists and how we can store those.

Latest revision as of 18:35, 16 February 2009

In the previous chapter we were talking about loading/saving a file.

Fortunately we made it without actually thinking on how the library know how to parse files. In this chapter, we'll review this topic.

In most cases IOClient is able to read and write a stream of data to file system, zip package or RCS repository. But as we work on localization tools, we need to operate on the L10nObject structure that has entities, comments and potentially other data inside. To do this we have to parse this input string and then we have to serialize the object back into string.

For this we have a FormatParser package.

FPManager

FPManager is very similar to IOManager and serves as a controller class for FormatParsers.

import silme.core import silme.format

fp = silme.format.Manager.get('dtd')

string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n'

l10nobject = fp.get_l10nobject(string)

In many cases you will not need to use it because it's automagically covered by IOClient without you having to bother. You control the list of formats the system supports by adding or removing imports.

When you import a format parser, it will be used while reading L10nPackages and L10nObjects without any more work. But you can also use FormatParsers on your own if you want.

In a similar way to IOManager, FPManager can also guess which format to return depending on the path.

import silme.core import silme.format

silme.format.Manager.register('dtd', 'gettext', 'properties', 'xliff')

fp = silme.format.Manager.get(path='./test/file.po') # will return GetTextFormatParser

FormatParser

reading

In the basic situation you can use FormatParser to parse a string into L10nObject or EntityList:

import silme.core import silme.format

fp = silme.format.Manager.get('dtd')

string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n'

l10nobject = fp.get_l10nobject(string)

entitylist = fp.get_entitylist(string)

Building EntityList is of course faster because it will ignore strings and comments and all other potentially meaningful content.

writing

But the major drawback from using EntityList instead of L10nObject is that you loose ability to write it back to the original file and have it 100% the same.

FormatParser is able to serialize any EntityList or L10nObject but it will use generic template if it's not writing an L10nObject and if the L10nObject has not been read from the same format.

import silme.core import silme.format

silme.format.Manager.register('dtd', 'po')

fp = silme.format.Manager.get('dtd') fp2 = silme.format.Manager.get('po') string = '<!ENTITY test "value">\n\n<!ENTITY test2 "value2">\n\n\n' l10nobject = fp.get_l10nobject(string) entitylist = fp.get_entitylist(string)

string2 = fp2.write_l10nobject(l10nobject) # it will create a very simple PO object with the data from L10nObject

string3 = fp.write_entitylist(entitylist) # it will create a simple DTD string without any data beside of generic entities from the list

Standalone parsers

In some cases you may want to use a format without silme. Maybe you have your own applications or your custom needs and could use a good DTD parser or Gettext serializer but you don't want to use the whole system of Managers/Clients.

Each Format is a package made of:

  • silme.format.%name%.Parser
  • silme.format.%name%.Serializer
  • silme.format.%name%.Structure


You can separate this and use independently from the rest of Silme.

In the next chapter we'll talk about methods to find out what are the differences between two entity lists and how we can store those.