dongsheng@623: #!/usr/bin/perl -w dongsheng@623: dongsheng@623: require Exporter; dongsheng@623: dongsheng@623: package Locale::Po4a::TransTractor; dongsheng@623: use DynaLoader; dongsheng@623: dongsheng@623: use 5.006; dongsheng@623: use strict; dongsheng@623: use warnings; dongsheng@623: dongsheng@623: use subs qw(makespace); dongsheng@623: use vars qw($VERSION @ISA @EXPORT); dongsheng@623: $VERSION="0.36"; dongsheng@623: @ISA = qw(DynaLoader); dongsheng@623: @EXPORT = qw(new process translate dongsheng@623: read write readpo writepo dongsheng@623: getpoout setpoout); dongsheng@623: dongsheng@623: # Try to use a C extension if present. dongsheng@623: eval("bootstrap Locale::Po4a::TransTractor $VERSION"); dongsheng@623: dongsheng@623: use Carp qw(croak); dongsheng@623: use Locale::Po4a::Po; dongsheng@623: use Locale::Po4a::Common; dongsheng@623: dongsheng@623: use File::Path; # mkdir before write dongsheng@623: dongsheng@623: use Encode; dongsheng@623: use Encode::Guess; dongsheng@623: dongsheng@623: =head1 NAME dongsheng@623: dongsheng@623: Locale::Po4a::TransTractor - Generic trans(lator ex)tractor. dongsheng@623: dongsheng@623: =head1 DESCRIPTION dongsheng@623: dongsheng@623: The po4a (po for anything) project goal is to ease translations (and more dongsheng@623: interestingly, the maintenance of translations) using gettext tools on dongsheng@623: areas where they were not expected like documentation. dongsheng@623: dongsheng@623: This class is the ancestor of every po4a parsers used to parse a document to dongsheng@623: search translatable strings, extract them to a po file and replace them by dongsheng@623: their translation in the output document. dongsheng@623: dongsheng@623: More formally, it takes the following arguments as input: dongsheng@623: dongsheng@623: =over 2 dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: a document to translate ; dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: a po file containing the translations to use. dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: As output, it produces: dongsheng@623: dongsheng@623: =over 2 dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: another po file, resulting of the extraction of translatable strings from dongsheng@623: the input document ; dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: a translated document, with the same structure than the one in input, but dongsheng@623: with all translatable strings replaced with the translations found in the dongsheng@623: po file provided in input. dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: Here is a graphical representation of this: dongsheng@623: dongsheng@623: Input document --\ /---> Output document dongsheng@623: \ / (translated) dongsheng@623: +-> parse() function -----+ dongsheng@623: / \ dongsheng@623: Input po --------/ \---> Output po dongsheng@623: (extracted) dongsheng@623: dongsheng@623: =head1 FUNCTIONS YOUR PARSER SHOULD OVERRIDE dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item parse() dongsheng@623: dongsheng@623: This is where all the work takes place: the parsing of input documents, the dongsheng@623: generation of output, and the extraction of the translatable strings. This dongsheng@623: is pretty simple using the provided functions presented in the section dongsheng@623: "INTERNAL FUNCTIONS" below. See also the synopsis, which present an dongsheng@623: example. dongsheng@623: dongsheng@623: This function is called by the process() function bellow, but if you choose dongsheng@623: to use the new() function, and to add content manually to your document, dongsheng@623: you will have to call this function yourself. dongsheng@623: dongsheng@623: =item docheader() dongsheng@623: dongsheng@623: This function returns the header we should add to the produced document, dongsheng@623: quoted properly to be a comment in the target language. See the section dongsheng@623: "Educating developers about translations", from L, for what dongsheng@623: it is good for. dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub docheader {} dongsheng@623: dongsheng@623: sub parse {} dongsheng@623: dongsheng@623: =head1 SYNOPSIS dongsheng@623: dongsheng@623: The following example parses a list of paragraphs beginning with "

". For the sake dongsheng@623: of simplicity, we assume that the document is well formatted, i.e. that '

' dongsheng@623: tags are the only tags present, and that this tag is at the very beginning dongsheng@623: of each paragraph. dongsheng@623: dongsheng@623: sub parse { dongsheng@623: my $self = shift; dongsheng@623: dongsheng@623: PARAGRAPH: while (1) { dongsheng@623: my ($paragraph,$pararef)=("",""); dongsheng@623: my $first=1; dongsheng@623: my ($line,$lref)=$self->shiftline(); dongsheng@623: while (defined($line)) { dongsheng@623: if ($line =~ m/

/ && !$first--; ) { dongsheng@623: # Not the first time we see

. dongsheng@623: # Reput the current line in input, dongsheng@623: # and put the built paragraph to output dongsheng@623: $self->unshiftline($line,$lref); dongsheng@623: dongsheng@623: # Now that the document is formed, translate it: dongsheng@623: # - Remove the leading tag dongsheng@623: $paragraph =~ s/^

//s; dongsheng@623: dongsheng@623: # - push to output the leading tag (untranslated) and the dongsheng@623: # rest of the paragraph (translated) dongsheng@623: $self->pushline( "

" dongsheng@623: . $document->translate($paragraph,$pararef) dongsheng@623: ); dongsheng@623: dongsheng@623: next PARAGRAPH; dongsheng@623: } else { dongsheng@623: # Append to the paragraph dongsheng@623: $paragraph .= $line; dongsheng@623: $pararef = $lref unless(length($pararef)); dongsheng@623: } dongsheng@623: dongsheng@623: # Reinit the loop dongsheng@623: ($line,$lref)=$self->shiftline(); dongsheng@623: } dongsheng@623: # Did not get a defined line? End of input file. dongsheng@623: return; dongsheng@623: } dongsheng@623: } dongsheng@623: dongsheng@623: Once you've implemented the parse function, you can use your document dongsheng@623: class, using the public interface presented in the next section. dongsheng@623: dongsheng@623: =head1 PUBLIC INTERFACE for scripts using your parser dongsheng@623: dongsheng@623: =head2 Constructor dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item process(%) dongsheng@623: dongsheng@623: This function can do all you need to do with a po4a document in one dongsheng@623: invocation. Its arguments must be packed as a hash. ACTIONS: dongsheng@623: dongsheng@623: =over 3 dongsheng@623: dongsheng@623: =item a. dongsheng@623: dongsheng@623: Reads all the po files specified in po_in_name dongsheng@623: dongsheng@623: =item b. dongsheng@623: dongsheng@623: Reads all original documents specified in file_in_name dongsheng@623: dongsheng@623: =item c. dongsheng@623: dongsheng@623: Parses the document dongsheng@623: dongsheng@623: =item d. dongsheng@623: dongsheng@623: Reads and applies all the addenda specified dongsheng@623: dongsheng@623: =item e. dongsheng@623: dongsheng@623: Writes the translated document to file_out_name (if given) dongsheng@623: dongsheng@623: =item f. dongsheng@623: dongsheng@623: Writes the extracted po file to po_out_name (if given) dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: ARGUMENTS, beside the ones accepted by new() (with expected type): dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item file_in_name (@) dongsheng@623: dongsheng@623: List of filenames where we should read the input document. dongsheng@623: dongsheng@623: =item file_in_charset ($) dongsheng@623: dongsheng@623: Charset used in the input document (if it isn't specified, it will try dongsheng@623: to detect it from the input document). dongsheng@623: dongsheng@623: =item file_out_name ($) dongsheng@623: dongsheng@623: Filename where we should write the output document. dongsheng@623: dongsheng@623: =item file_out_charset ($) dongsheng@623: dongsheng@623: Charset used in the output document (if it isn't specified, it will use dongsheng@623: the po file charset). dongsheng@623: dongsheng@623: =item po_in_name (@) dongsheng@623: dongsheng@623: List of filenames where we should read the input po files from, containing dongsheng@623: the translation which will be used to translate the document. dongsheng@623: dongsheng@623: =item po_out_name ($) dongsheng@623: dongsheng@623: Filename where we should write the output po file, containing the strings dongsheng@623: extracted from the input document. dongsheng@623: dongsheng@623: =item addendum (@) dongsheng@623: dongsheng@623: List of filenames where we should read the addenda from. dongsheng@623: dongsheng@623: =item addendum_charset ($) dongsheng@623: dongsheng@623: Charset for the addenda. dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =item new(%) dongsheng@623: dongsheng@623: Create a new Po4a document. Accepted options (but be in a hash): dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item verbose ($) dongsheng@623: dongsheng@623: Sets the verbosity. dongsheng@623: dongsheng@623: =item debug ($) dongsheng@623: dongsheng@623: Sets the debugging. dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub process { dongsheng@623: ## Determine if we were called via an object-ref or a classname dongsheng@623: my $self = shift; dongsheng@623: dongsheng@623: ## Any remaining arguments are treated as initial values for the dongsheng@623: ## hash that is used to represent this object. dongsheng@623: my %params = @_; dongsheng@623: dongsheng@623: # Build the args for new() dongsheng@623: my %newparams = (); dongsheng@623: foreach (keys %params) { dongsheng@623: next if ($_ eq 'po_in_name' || dongsheng@623: $_ eq 'po_out_name' || dongsheng@623: $_ eq 'file_in_name' || dongsheng@623: $_ eq 'file_in_charset' || dongsheng@623: $_ eq 'file_out_name' || dongsheng@623: $_ eq 'file_out_charset' || dongsheng@623: $_ eq 'addendum' || dongsheng@623: $_ eq 'addendum_charset'); dongsheng@623: $newparams{$_}=$params{$_}; dongsheng@623: } dongsheng@623: dongsheng@623: $self->detected_charset($params{'file_in_charset'}); dongsheng@623: $self->{TT}{'file_out_charset'}=$params{'file_out_charset'}; dongsheng@623: if (defined($self->{TT}{'file_out_charset'}) and dongsheng@623: length($self->{TT}{'file_out_charset'})) { dongsheng@623: $self->{TT}{'file_out_encoder'} = find_encoding($self->{TT}{'file_out_charset'}); dongsheng@623: } dongsheng@623: $self->{TT}{'addendum_charset'}=$params{'addendum_charset'}; dongsheng@623: dongsheng@623: foreach my $file (@{$params{'po_in_name'}}) { dongsheng@623: print STDERR "readpo($file)... " if $self->debug(); dongsheng@623: $self->readpo($file); dongsheng@623: print STDERR "done.\n" if $self->debug() dongsheng@623: } dongsheng@623: foreach my $file (@{$params{'file_in_name'}}) { dongsheng@623: print STDERR "read($file)..." if $self->debug(); dongsheng@623: $self->read($file); dongsheng@623: print STDERR "done.\n" if $self->debug(); dongsheng@623: } dongsheng@623: print STDERR "parse..." if $self->debug(); dongsheng@623: $self->parse(); dongsheng@623: print STDERR "done.\n" if $self->debug(); dongsheng@623: foreach my $file (@{$params{'addendum'}}) { dongsheng@623: print STDERR "addendum($file)..." if $self->debug(); dongsheng@623: $self->addendum($file) || die "An addendum failed\n"; dongsheng@623: print STDERR "done.\n" if $self->debug(); dongsheng@623: } dongsheng@623: if (defined $params{'file_out_name'}) { dongsheng@623: print STDERR "write(".$params{'file_out_name'}.")... " dongsheng@623: if $self->debug(); dongsheng@623: $self->write($params{'file_out_name'}); dongsheng@623: print STDERR "done.\n" if $self->debug(); dongsheng@623: } dongsheng@623: if (defined $params{'po_out_name'}) { dongsheng@623: print STDERR "writepo(".$params{'po_out_name'}.")... " dongsheng@623: if $self->debug(); dongsheng@623: $self->writepo($params{'po_out_name'}); dongsheng@623: print STDERR "done.\n" if $self->debug(); dongsheng@623: } dongsheng@623: return $self; dongsheng@623: } dongsheng@623: dongsheng@623: sub new { dongsheng@623: ## Determine if we were called via an object-ref or a classname dongsheng@623: my $this = shift; dongsheng@623: my $class = ref($this) || $this; dongsheng@623: my $self = { }; dongsheng@623: my %options=@_; dongsheng@623: ## Bless ourselves into the desired class and perform any initialization dongsheng@623: bless $self, $class; dongsheng@623: dongsheng@623: ## initialize the plugin dongsheng@623: # prevent the plugin from croaking on the options intended for Po.pm dongsheng@623: $self->{options}{'porefs'} = ''; dongsheng@623: # let the plugin parse the options and such dongsheng@623: $self->initialize(%options); dongsheng@623: dongsheng@623: ## Create our private data dongsheng@623: my %po_options; dongsheng@623: $po_options{'porefs'} = $self->{options}{'porefs'}; dongsheng@623: dongsheng@623: # private data dongsheng@623: $self->{TT}=(); dongsheng@623: $self->{TT}{po_in}=Locale::Po4a::Po->new(); dongsheng@623: $self->{TT}{po_out}=Locale::Po4a::Po->new(\%po_options); dongsheng@623: # Warning, this is an array of array: dongsheng@623: # The document is splited on lines, and for each dongsheng@623: # [0] is the line content, [1] is the reference [2] the type dongsheng@623: $self->{TT}{doc_in}=(); dongsheng@623: $self->{TT}{doc_out}=(); dongsheng@623: if (defined $options{'verbose'}) { dongsheng@623: $self->{TT}{verbose} = $options{'verbose'}; dongsheng@623: } dongsheng@623: if (defined $options{'debug'}) { dongsheng@623: $self->{TT}{debug} = $options{'debug'}; dongsheng@623: } dongsheng@623: # Input document is in ascii until we prove the opposite (in read()) dongsheng@623: $self->{TT}{ascii_input}=1; dongsheng@623: # We try not to use utf unless it's forced from the outside (in case the dongsheng@623: # document isn't in ascii) dongsheng@623: $self->{TT}{utf_mode}=0; dongsheng@623: dongsheng@623: dongsheng@623: return $self; dongsheng@623: } dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =head2 Manipulating document files dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item read($) dongsheng@623: dongsheng@623: Add another input document at the end of the existing one. The argument is dongsheng@623: the filename to read. dongsheng@623: dongsheng@623: Please note that it does not parse anything. You should use the parse() dongsheng@623: function when you're done with packing input files into the document. dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: #' dongsheng@623: sub read() { dongsheng@623: my $self=shift; dongsheng@623: my $filename=shift dongsheng@623: or croak wrap_msg(dgettext("po4a", "Can't read from file without having a filename")); dongsheng@623: my $linenum=0; dongsheng@623: dongsheng@623: open INPUT,"<$filename" dongsheng@623: or croak wrap_msg(dgettext("po4a", "Can't read from %s: %s"), $filename, $!); dongsheng@623: while (defined (my $textline = )) { dongsheng@623: $linenum++; dongsheng@623: my $ref="$filename:$linenum"; dongsheng@623: my @entry=($textline,$ref); dongsheng@623: push @{$self->{TT}{doc_in}}, @entry; dongsheng@623: dongsheng@623: if (!defined($self->{TT}{'file_in_charset'})) { dongsheng@623: # Detect if this file has non-ascii characters dongsheng@623: if($self->{TT}{ascii_input}) { dongsheng@623: my $decoder = guess_encoding($textline); dongsheng@623: if (!ref($decoder) or $decoder !~ /Encode::XS=/) { dongsheng@623: # We have detected a non-ascii line dongsheng@623: $self->{TT}{ascii_input} = 0; dongsheng@623: # Save the reference for future error message dongsheng@623: $self->{TT}{non_ascii_ref} ||= $ref; dongsheng@623: } dongsheng@623: } dongsheng@623: } dongsheng@623: } dongsheng@623: close INPUT dongsheng@623: or croak wrap_msg(dgettext("po4a", "Can't close %s after reading: %s"), $filename, $!); dongsheng@623: dongsheng@623: } dongsheng@623: dongsheng@623: =item write($) dongsheng@623: dongsheng@623: Write the translated document to the given filename. dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub write { dongsheng@623: my $self=shift; dongsheng@623: my $filename=shift dongsheng@623: or croak wrap_msg(dgettext("po4a", "Can't write to a file without filename")); dongsheng@623: dongsheng@623: my $fh; dongsheng@623: if ($filename eq '-') { dongsheng@623: $fh=\*STDOUT; dongsheng@623: } else { dongsheng@623: # make sure the directory in which we should write the localized file exists dongsheng@623: my $dir = $filename; dongsheng@623: if ($dir =~ m|/|) { dongsheng@623: $dir =~ s|/[^/]*$||; dongsheng@623: dongsheng@623: File::Path::mkpath($dir, 0, 0755) # Croaks on error dongsheng@623: if (length ($dir) && ! -e $dir); dongsheng@623: } dongsheng@623: open $fh,">$filename" dongsheng@623: or croak wrap_msg(dgettext("po4a", "Can't write to %s: %s"), $filename, $!); dongsheng@623: } dongsheng@623: dongsheng@623: map { print $fh $_ } $self->docheader(); dongsheng@623: map { print $fh $_ } @{$self->{TT}{doc_out}}; dongsheng@623: dongsheng@623: if ($filename ne '-') { dongsheng@623: close $fh or croak wrap_msg(dgettext("po4a", "Can't close %s after writing: %s"), $filename, $!); dongsheng@623: } dongsheng@623: dongsheng@623: } dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =head2 Manipulating po files dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item readpo($) dongsheng@623: dongsheng@623: Add the content of a file (which name is passed in argument) to the dongsheng@623: existing input po. The old content is not discarded. dongsheng@623: dongsheng@623: =item writepo($) dongsheng@623: dongsheng@623: Write the extracted po file to the given filename. dongsheng@623: dongsheng@623: =item stats() dongsheng@623: dongsheng@623: Returns some statistics about the translation done so far. Please note that dongsheng@623: it's not the same statistics than the one printed by msgfmt dongsheng@623: --statistic. Here, it's stats about recent usage of the po file, while dongsheng@623: msgfmt reports the status of the file. It is a wrapper to the dongsheng@623: Locale::Po4a::Po::stats_get function applied to the input po file. Example dongsheng@623: of use: dongsheng@623: dongsheng@623: [normal use of the po4a document...] dongsheng@623: dongsheng@623: ($percent,$hit,$queries) = $document->stats(); dongsheng@623: print "We found translations for $percent\% ($hit from $queries) of strings.\n"; dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub getpoout { dongsheng@623: return $_[0]->{TT}{po_out}; dongsheng@623: } dongsheng@623: sub setpoout { dongsheng@623: $_[0]->{TT}{po_out} = $_[1]; dongsheng@623: } dongsheng@623: sub readpo { dongsheng@623: $_[0]->{TT}{po_in}->read($_[1]); dongsheng@623: } dongsheng@623: sub writepo { dongsheng@623: $_[0]->{TT}{po_out}->write( $_[1] ); dongsheng@623: } dongsheng@623: sub stats { dongsheng@623: return $_[0]->{TT}{po_in}->stats_get(); dongsheng@623: } dongsheng@623: dongsheng@623: =head2 Manipulating addenda dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item addendum($) dongsheng@623: dongsheng@623: Please refer to L for more information on what addenda are, dongsheng@623: and how translators should write them. To apply an addendum to the translated dongsheng@623: document, simply pass its filename to this function and you are done ;) dongsheng@623: dongsheng@623: This function returns a non-null integer on error. dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: # Internal function to read the header. dongsheng@623: sub addendum_parse { dongsheng@623: my ($filename,$header)=shift; dongsheng@623: dongsheng@623: my ($errcode,$mode,$position,$boundary,$bmode,$content)= dongsheng@623: (1,"","","","",""); dongsheng@623: dongsheng@623: unless (open (INS, "<$filename")) { dongsheng@623: warn wrap_msg(dgettext("po4a", "Can't read from %s: %s"), $filename, $!); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: dongsheng@623: unless (defined ($header=) && $header) { dongsheng@623: warn wrap_msg(dgettext("po4a", "Can't read Po4a header from %s."), $filename); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: dongsheng@623: unless ($header =~ s/PO4A-HEADER://i) { dongsheng@623: warn wrap_msg(dgettext("po4a", "First line of %s does not look like a Po4a header."), $filename); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: foreach my $part (split(/;/,$header)) { dongsheng@623: unless ($part =~ m/^\s*([^=]*)=(.*)$/) { dongsheng@623: warn wrap_msg(dgettext("po4a", "Syntax error in Po4a header of %s, near \"%s\""), $filename, $part); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: my ($key,$value)=($1,$2); dongsheng@623: $key=lc($key); dongsheng@623: if ($key eq 'mode') { $mode=lc($value); dongsheng@623: } elsif ($key eq 'position') { $position=$value; dongsheng@623: } elsif ($key eq 'endboundary') { dongsheng@623: $boundary=$value; dongsheng@623: $bmode='after'; dongsheng@623: } elsif ($key eq 'beginboundary') { dongsheng@623: $boundary=$value; dongsheng@623: $bmode='before'; dongsheng@623: } else { dongsheng@623: warn wrap_msg(dgettext("po4a", "Invalid argument in the Po4a header of %s: %s"), $filename, $key); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: } dongsheng@623: dongsheng@623: unless (length($mode)) { dongsheng@623: warn wrap_msg(dgettext("po4a", "The Po4a header of %s does not define the mode."), $filename); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: unless ($mode eq "before" || $mode eq "after") { dongsheng@623: warn wrap_msg(dgettext("po4a", "Mode invalid in the Po4a header of %s: should be 'before' or 'after' not %s."), $filename, $mode); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: dongsheng@623: unless (length($position)) { dongsheng@623: warn wrap_msg(dgettext("po4a", "The Po4a header of %s does not define the position."), $filename); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: unless ($mode eq "before" || length($boundary)) { dongsheng@623: warn wrap_msg(dgettext("po4a", "No ending boundary given in the Po4a header, but mode=after.")); dongsheng@623: goto END_PARSE_ADDFILE; dongsheng@623: } dongsheng@623: dongsheng@623: while (defined(my $line = )) { dongsheng@623: $content .= $line; dongsheng@623: } dongsheng@623: close INS; dongsheng@623: dongsheng@623: $errcode=0; dongsheng@623: END_PARSE_ADDFILE: dongsheng@623: return ($errcode,$mode,$position,$boundary,$bmode,$content); dongsheng@623: } dongsheng@623: dongsheng@623: sub mychomp { dongsheng@623: my ($str) = shift; dongsheng@623: chomp($str); dongsheng@623: return $str; dongsheng@623: } dongsheng@623: dongsheng@623: sub addendum { dongsheng@623: my ($self,$filename) = @_; dongsheng@623: dongsheng@623: print STDERR "Apply addendum $filename..." if $self->debug(); dongsheng@623: unless ($filename) { dongsheng@623: warn wrap_msg(dgettext("po4a", dongsheng@623: "Can't apply addendum when not given the filename")); dongsheng@623: return 0; dongsheng@623: } dongsheng@623: die wrap_msg(dgettext("po4a", "Addendum %s does not exist."), $filename) dongsheng@623: unless -e $filename; dongsheng@623: dongsheng@623: my ($errcode,$mode,$position,$boundary,$bmode,$content)= dongsheng@623: addendum_parse($filename); dongsheng@623: return 0 if ($errcode); dongsheng@623: dongsheng@623: print STDERR "mode=$mode;pos=$position;bound=$boundary;bmode=$bmode;ctn=$content\n" dongsheng@623: if $self->debug(); dongsheng@623: dongsheng@623: # We only recode the addendum if an origin charset is specified, else we dongsheng@623: # suppose it's already in the output document's charset dongsheng@623: if (defined($self->{TT}{'addendum_charset'}) && dongsheng@623: length($self->{TT}{'addendum_charset'})) { dongsheng@623: Encode::from_to($content,$self->{TT}{'addendum_charset'}, dongsheng@623: $self->get_out_charset); dongsheng@623: } dongsheng@623: dongsheng@623: my $found = scalar grep { /$position/ } @{$self->{TT}{doc_out}}; dongsheng@623: if ($found == 0) { dongsheng@623: warn wrap_msg(dgettext("po4a", dongsheng@623: "No candidate position for the addendum %s."), $filename); dongsheng@623: return 0; dongsheng@623: } dongsheng@623: if ($found > 1) { dongsheng@623: warn wrap_msg(dgettext("po4a", dongsheng@623: "More than one candidate position found for the addendum %s."), $filename); dongsheng@623: return 0; dongsheng@623: } dongsheng@623: dongsheng@623: if ($mode eq "before") { dongsheng@623: if ($self->verbose() > 1 || $self->debug() ) { dongsheng@623: map { print STDERR wrap_msg(dgettext("po4a", "Addendum '%s' applied before this line: %s"), $filename, $_) if (/$position/); dongsheng@623: } @{$self->{TT}{doc_out}}; dongsheng@623: } dongsheng@623: @{$self->{TT}{doc_out}} = map { /$position/ ? ($content,$_) : $_ dongsheng@623: } @{$self->{TT}{doc_out}}; dongsheng@623: } else { dongsheng@623: my @newres=(); dongsheng@623: dongsheng@623: do { dongsheng@623: # make sure it doesnt whine on empty document dongsheng@623: my $line = scalar @{$self->{TT}{doc_out}} ? shift @{$self->{TT}{doc_out}} : ""; dongsheng@623: push @newres,$line; dongsheng@623: my $outline=mychomp($line); dongsheng@623: $outline =~ s/^[ \t]*//; dongsheng@623: dongsheng@623: if ($line =~ m/$position/) { dongsheng@623: while ($line=shift @{$self->{TT}{doc_out}}) { dongsheng@623: last if ($line=~/$boundary/); dongsheng@623: push @newres,$line; dongsheng@623: } dongsheng@623: if (defined $line) { dongsheng@623: if ($bmode eq 'before') { dongsheng@623: print wrap_msg(dgettext("po4a", dongsheng@623: "Addendum '%s' applied before this line: %s"), dongsheng@623: $filename, $outline) dongsheng@623: if ($self->verbose() > 1 || $self->debug()); dongsheng@623: push @newres,$content; dongsheng@623: push @newres,$line; dongsheng@623: } else { dongsheng@623: print wrap_msg(dgettext("po4a", dongsheng@623: "Addendum '%s' applied after the line: %s."), dongsheng@623: $filename, $outline) dongsheng@623: if ($self->verbose() > 1 || $self->debug()); dongsheng@623: push @newres,$line; dongsheng@623: push @newres,$content; dongsheng@623: } dongsheng@623: } else { dongsheng@623: print wrap_msg(dgettext("po4a", "Addendum '%s' applied at the end of the file."), $filename) dongsheng@623: if ($self->verbose() > 1 || $self->debug()); dongsheng@623: push @newres,$content; dongsheng@623: } dongsheng@623: } dongsheng@623: } while (scalar @{$self->{TT}{doc_out}}); dongsheng@623: @{$self->{TT}{doc_out}} = @newres; dongsheng@623: } dongsheng@623: print STDERR "done.\n" if $self->debug(); dongsheng@623: return 1; dongsheng@623: } dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =head1 INTERNAL FUNCTIONS used to write derivated parsers dongsheng@623: dongsheng@623: =head2 Getting input, providing output dongsheng@623: dongsheng@623: Four functions are provided to get input and return output. They are very dongsheng@623: similar to shift/unshift and push/pop. The first pair is about input, while dongsheng@623: the second is about output. Mnemonic: in input, you are interested in the dongsheng@623: first line, what shift gives, and in output you want to add your result at dongsheng@623: the end, like push does. dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item shiftline() dongsheng@623: dongsheng@623: This function returns the next line of the doc_in to be parsed and its dongsheng@623: reference (packed as an array). dongsheng@623: dongsheng@623: =item unshiftline($$) dongsheng@623: dongsheng@623: Unshifts a line of the input document and its reference. dongsheng@623: dongsheng@623: =item pushline($) dongsheng@623: dongsheng@623: Push a new line to the doc_out. dongsheng@623: dongsheng@623: =item popline() dongsheng@623: dongsheng@623: Pop the last pushed line from the doc_out. dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub shiftline { dongsheng@623: my ($line,$ref)=(shift @{$_[0]->{TT}{doc_in}}, dongsheng@623: shift @{$_[0]->{TT}{doc_in}}); dongsheng@623: return ($line,$ref); dongsheng@623: } dongsheng@623: sub unshiftline { dongsheng@623: my $self = shift; dongsheng@623: unshift @{$self->{TT}{doc_in}},@_; dongsheng@623: } dongsheng@623: dongsheng@623: sub pushline { push @{$_[0]->{TT}{doc_out}}, $_[1] if defined $_[1]; } dongsheng@623: sub popline { return pop @{$_[0]->{TT}{doc_out}}; } dongsheng@623: dongsheng@623: =head2 Marking strings as translatable dongsheng@623: dongsheng@623: One function is provided to handle the text which should be translated. dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item translate($$$) dongsheng@623: dongsheng@623: Mandatory arguments: dongsheng@623: dongsheng@623: =over 2 dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: A string to translate dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: The reference of this string (ie, position in inputfile) dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: The type of this string (ie, the textual description of its structural role dongsheng@623: ; used in Locale::Po4a::Po::gettextization() ; see also L, dongsheng@623: section I) dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: This function can also take some extra arguments. They must be organized as dongsheng@623: a hash. For example: dongsheng@623: dongsheng@623: $self->translate("string","ref","type", dongsheng@623: 'wrap' => 1); dongsheng@623: dongsheng@623: =over dongsheng@623: dongsheng@623: =item wrap dongsheng@623: dongsheng@623: boolean indicating whether we can consider that whitespaces in string are dongsheng@623: not important. If yes, the function canonizes the string before looking for dongsheng@623: a translation or extracting it, and wraps the translation. dongsheng@623: dongsheng@623: =item wrapcol dongsheng@623: dongsheng@623: The column at which we should wrap (default: 76). dongsheng@623: dongsheng@623: =item comment dongsheng@623: dongsheng@623: An extra comment to add to the entry. dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: Actions: dongsheng@623: dongsheng@623: =over 2 dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: Pushes the string, reference and type to po_out. dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: Returns the translation of the string (as found in po_in) so that the dongsheng@623: parser can build the doc_out. dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: Handles the charsets to recode the strings before sending them to dongsheng@623: po_out and before returning the translations. dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub translate { dongsheng@623: my $self=shift; dongsheng@623: my ($string,$ref,$type)=(shift,shift,shift); dongsheng@623: my (%options)=@_; dongsheng@623: dongsheng@623: # my $validoption="wrap wrapcol"; dongsheng@623: # my %validoption; dongsheng@623: dongsheng@623: return "" unless defined($string) && length($string); dongsheng@623: dongsheng@623: # map { $validoption{$_}=1 } (split(/ /,$validoption)); dongsheng@623: # foreach (keys %options) { dongsheng@623: # Carp::confess "internal error: translate() called with unknown arg $_. Valid options: $validoption" dongsheng@623: # unless $validoption{$_}; dongsheng@623: # } dongsheng@623: dongsheng@623: my $in_charset; dongsheng@623: if ($self->{TT}{ascii_input}) { dongsheng@623: $in_charset = "ascii"; dongsheng@623: } else { dongsheng@623: if (defined($self->{TT}{'file_in_charset'}) and dongsheng@623: length($self->{TT}{'file_in_charset'}) and dongsheng@623: $self->{TT}{'file_in_charset'} !~ m/ascii/i) { dongsheng@623: $in_charset=$self->{TT}{'file_in_charset'}; dongsheng@623: } else { dongsheng@623: # FYI, the document charset have to be determined *before* we see the first dongsheng@623: # string to recode. dongsheng@623: die wrap_mod("po4a", dgettext("po4a", "Couldn't determine the input document's charset. Please specify it on the command line. (non-ascii char at %s)"), $self->{TT}{non_ascii_ref}) dongsheng@623: } dongsheng@623: } dongsheng@623: dongsheng@623: if ($self->{TT}{po_in}->get_charset ne "CHARSET") { dongsheng@623: $string = encode_from_to($string, dongsheng@623: $self->{TT}{'file_in_encoder'}, dongsheng@623: $self->{TT}{po_in}{encoder}); dongsheng@623: } dongsheng@623: dongsheng@623: if (defined $options{'wrapcol'} && $options{'wrapcol'} < 0) { dongsheng@623: # FIXME: should be the parameter given with --width dongsheng@623: $options{'wrapcol'} = 76 + $options{'wrapcol'}; dongsheng@623: } dongsheng@623: my $transstring = $self->{TT}{po_in}->gettext($string, dongsheng@623: 'wrap' => $options{'wrap'}||0, dongsheng@623: 'wrapcol' => $options{'wrapcol'}); dongsheng@623: dongsheng@623: if ($self->{TT}{po_in}->get_charset ne "CHARSET") { dongsheng@623: my $out_encoder = $self->{TT}{'file_out_encoder'}; dongsheng@623: unless (defined $out_encoder) { dongsheng@623: $out_encoder = find_encoding($self->get_out_charset) dongsheng@623: } dongsheng@623: $transstring = encode_from_to($transstring, dongsheng@623: $self->{TT}{po_in}{encoder}, dongsheng@623: $out_encoder); dongsheng@623: } dongsheng@623: dongsheng@623: # If the input document isn't completely in ascii, we should see what to dongsheng@623: # do with the current string dongsheng@623: unless ($self->{TT}{ascii_input}) { dongsheng@623: my $out_charset = $self->{TT}{po_out}->get_charset; dongsheng@623: # We set the output po charset dongsheng@623: if ($out_charset eq "CHARSET") { dongsheng@623: if ($self->{TT}{utf_mode}) { dongsheng@623: $out_charset="utf-8"; dongsheng@623: } else { dongsheng@623: $out_charset=$in_charset; dongsheng@623: } dongsheng@623: $self->{TT}{po_out}->set_charset($out_charset); dongsheng@623: } dongsheng@623: if ( $in_charset !~ /^$out_charset$/i ) { dongsheng@623: Encode::from_to($string,$in_charset,$out_charset); dongsheng@623: if (defined($options{'comment'}) and length($options{'comment'})) { dongsheng@623: Encode::from_to($options{'comment'},$in_charset,$out_charset); dongsheng@623: } dongsheng@623: } dongsheng@623: } dongsheng@623: dongsheng@623: # the comments provided by the modules are automatic comments from the PO point of view dongsheng@623: $self->{TT}{po_out}->push('msgid' => $string, dongsheng@623: 'reference' => $ref, dongsheng@623: 'type' => $type, dongsheng@623: 'automatic' => $options{'comment'}, dongsheng@623: 'wrap' => $options{'wrap'}||0, dongsheng@623: 'wrapcol' => $options{'wrapcol'}); dongsheng@623: dongsheng@623: # if ($self->{TT}{po_in}->get_charset ne "CHARSET") { dongsheng@623: # Encode::from_to($transstring,$self->{TT}{po_in}->get_charset, dongsheng@623: # $self->get_out_charset); dongsheng@623: # } dongsheng@623: dongsheng@623: if ($options{'wrap'}||0) { dongsheng@623: $transstring =~ s/( *)$//s; dongsheng@623: my $trailing_spaces = $1||""; dongsheng@623: $transstring =~ s/ *$//gm; dongsheng@623: $transstring .= $trailing_spaces; dongsheng@623: } dongsheng@623: dongsheng@623: return $transstring; dongsheng@623: } dongsheng@623: dongsheng@623: =head2 Misc functions dongsheng@623: dongsheng@623: =over 4 dongsheng@623: dongsheng@623: =item verbose() dongsheng@623: dongsheng@623: Returns if the verbose option was passed during the creation of the dongsheng@623: TransTractor. dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub verbose { dongsheng@623: if (defined $_[1]) { dongsheng@623: $_[0]->{TT}{verbose} = $_[1]; dongsheng@623: } else { dongsheng@623: return $_[0]->{TT}{verbose} || 0; # undef and 0 have the same meaning, but one generates warnings dongsheng@623: } dongsheng@623: } dongsheng@623: dongsheng@623: =item debug() dongsheng@623: dongsheng@623: Returns if the debug option was passed during the creation of the dongsheng@623: TransTractor. dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub debug { dongsheng@623: return $_[0]->{TT}{debug}; dongsheng@623: } dongsheng@623: dongsheng@623: =item detected_charset($) dongsheng@623: dongsheng@623: This tells TransTractor that a new charset (the first argument) has been dongsheng@623: detected from the input document. It can usually be read from the document dongsheng@623: header. Only the first charset will remain, coming either from the dongsheng@623: process() arguments or detected from the document. dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub detected_charset { dongsheng@623: my ($self,$charset)=(shift,shift); dongsheng@623: unless (defined($self->{TT}{'file_in_charset'}) and dongsheng@623: length($self->{TT}{'file_in_charset'}) ) { dongsheng@623: $self->{TT}{'file_in_charset'}=$charset; dongsheng@623: if (defined $charset) { dongsheng@623: $self->{TT}{'file_in_encoder'}=find_encoding($charset); dongsheng@623: } dongsheng@623: } dongsheng@623: dongsheng@623: if (defined $self->{TT}{'file_in_charset'} and dongsheng@623: length $self->{TT}{'file_in_charset'} and dongsheng@623: $self->{TT}{'file_in_charset'} !~ m/ascii/i) { dongsheng@623: $self->{TT}{ascii_input}=0; dongsheng@623: } dongsheng@623: } dongsheng@623: dongsheng@623: =item get_out_charset() dongsheng@623: dongsheng@623: This function will return the charset that should be used in the output dongsheng@623: document (usually useful to substitute the input document's detected charset dongsheng@623: where it has been found). dongsheng@623: dongsheng@623: It will use the output charset specified in the command line. If it wasn't dongsheng@623: specified, it will use the input po's charset, and if the input po has the dongsheng@623: default "CHARSET", it will return the input document's charset, so that no dongsheng@623: encoding is performed. dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub get_out_charset { dongsheng@623: my $self=shift; dongsheng@623: my $charset; dongsheng@623: dongsheng@623: # Use the value specified at the command line dongsheng@623: if (defined($self->{TT}{'file_out_charset'}) and dongsheng@623: length($self->{TT}{'file_out_charset'})) { dongsheng@623: $charset=$self->{TT}{'file_out_charset'}; dongsheng@623: } else { dongsheng@623: if ($self->{TT}{utf_mode} && $self->{TT}{ascii_input}) { dongsheng@623: $charset="utf-8"; dongsheng@623: } else { dongsheng@623: $charset=$self->{TT}{po_in}->get_charset; dongsheng@623: $charset=$self->{TT}{'file_in_charset'} dongsheng@623: if $charset eq "CHARSET" and dongsheng@623: defined($self->{TT}{'file_in_charset'}) and dongsheng@623: length($self->{TT}{'file_in_charset'}); dongsheng@623: $charset="ascii" dongsheng@623: if $charset eq "CHARSET"; dongsheng@623: } dongsheng@623: } dongsheng@623: return $charset; dongsheng@623: } dongsheng@623: dongsheng@623: =item recode_skipped_text($) dongsheng@623: dongsheng@623: This function returns the recoded text passed as argument, from the input dongsheng@623: document's charset to the output document's one. This isn't needed when dongsheng@623: translating a string (translate() recodes everything itself), but it is when dongsheng@623: you skip a string from the input document and you want the output document to dongsheng@623: be consistent with the global encoding. dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: sub recode_skipped_text { dongsheng@623: my ($self,$text)=(shift,shift); dongsheng@623: unless ($self->{TT}{'ascii_input'}) { dongsheng@623: if(defined($self->{TT}{'file_in_charset'}) and dongsheng@623: length($self->{TT}{'file_in_charset'}) ) { dongsheng@623: $text = encode_from_to($text, dongsheng@623: $self->{TT}{'file_in_encoder'}, dongsheng@623: find_encoding($self->get_out_charset)); dongsheng@623: } else { dongsheng@623: die wrap_mod("po4a", dgettext("po4a", "Couldn't determine the input document's charset. Please specify it on the command line. (non-ascii char at %s)"), $self->{TT}{non_ascii_ref}) dongsheng@623: } dongsheng@623: } dongsheng@623: return $text; dongsheng@623: } dongsheng@623: dongsheng@623: dongsheng@623: # encode_from_to($,$,$) dongsheng@623: # dongsheng@623: # Encode the given text from one encoding to another one. dongsheng@623: # It differs from Encode::from_to because it does not take the name of the dongsheng@623: # encoding in argument, but the encoders (as returned by the dongsheng@623: # Encode::find_encoding() method). Thus it permits to save a bunch dongsheng@623: # of call to find_encoding. dongsheng@623: # dongsheng@623: # If the "from" encoding is undefined, it is considered as UTF-8 (or dongsheng@623: # ascii). dongsheng@623: # If the "to" encoding is undefined, it is considered as UTF-8. dongsheng@623: # dongsheng@623: sub encode_from_to { dongsheng@623: my ($text,$from,$to) = (shift,shift,shift); dongsheng@623: dongsheng@623: if (not defined $from) { dongsheng@623: # for ascii and UTF-8, no conversion needed to get an utf-8 dongsheng@623: # string. dongsheng@623: } else { dongsheng@623: $text = $from->decode($text, 0); dongsheng@623: } dongsheng@623: dongsheng@623: if (not defined $to) { dongsheng@623: # Already in UTF-8, no conversion needed dongsheng@623: } else { dongsheng@623: $text = $to->encode($text, 0); dongsheng@623: } dongsheng@623: dongsheng@623: return $text; dongsheng@623: } dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: =head1 FUTURE DIRECTIONS dongsheng@623: dongsheng@623: One shortcoming of the current TransTractor is that it can't handle dongsheng@623: translated document containing all languages, like debconf templates, or dongsheng@623: .desktop files. dongsheng@623: dongsheng@623: To address this problem, the only interface changes needed are: dongsheng@623: dongsheng@623: =over 2 dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: take a hash as po_in_name (a list per language) dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: add an argument to translate to indicate the target language dongsheng@623: dongsheng@623: =item - dongsheng@623: dongsheng@623: make a pushline_all function, which would make pushline of its content for dongsheng@623: all language, using a map-like syntax: dongsheng@623: dongsheng@623: $self->pushline_all({ "Description[".$langcode."]=". dongsheng@623: $self->translate($line,$ref,$langcode) dongsheng@623: }); dongsheng@623: dongsheng@623: =back dongsheng@623: dongsheng@623: Will see if it's enough ;) dongsheng@623: dongsheng@623: =head1 AUTHORS dongsheng@623: dongsheng@623: Denis Barbier dongsheng@623: Martin Quinson (mquinson#debian.org) dongsheng@623: Jordi Vilalta dongsheng@623: dongsheng@623: =cut dongsheng@623: dongsheng@623: 1;