arxiv-vanity/engrafo

Improve picking main .tex file from a directory

Open

#629 opened on Feb 24, 2019

View on GitHub
 (6 comments) (0 reactions) (0 assignees)HTML (990 stars) (96 forks)batch import
good first issuehelp wanted

Description

Some submissions such as 0706.2986 fail to render, because engrafo currently cannot pick the right .tex file to use as the main .tex file. Its current criteria are contained in src/converter/io.js:

// Pick a main .tex file from a directory
async function pickLatexFile(dir) {
  if (dir.endsWith(".tex")) {
    return dir;
  }
  const files = await fs.readdir(dir);
  if (files.includes("ms.tex")) {
    return path.join(dir, "ms.tex");
  }
  if (files.includes("main.tex")) {
    return path.join(dir, "main.tex");
  }
  const texPaths = files.filter(f => f.endsWith(".tex"));
  if (texPaths.length === 0) {
    throw new Error("No .tex files found");
  }
  if (texPaths.length === 1) {
    return path.join(dir, texPaths[0]);
  }
  let docCandidates = [];
  for (let p of texPaths) {
    let data = await fs.readFile(path.join(dir, p));
    if (data && data.includes("\\documentclass")) {
      docCandidates.push(p);
    }
  }
  if (docCandidates.length === 0) {
    throw new Error("No .tex files with \\documentclass or \\documentstyle found");
  }

  if (docCandidates.length === 1) {
    return path.join(dir, docCandidates[0]);
  }

  let bblCandidates = [];
  for (let p of docCandidates) {
    let bbl = p.replace(".tex", ".bbl");
    if (await fs.pathExists(path.join(dir, bbl))) {
      bblCandidates.push(p);
    }
  }

  if (bblCandidates.length > 1) {
    throw new Error(
      `Ambiguous LaTeX path (${bblCandidates.length} candidates)`
    );
  }
  return bblCandidates[0];
}

0706.2986 has two .tex files. The first .tex file, psfig.tex, is not the main .tex file, but it contains the following line:

% To use with LaTeX, use \documentstyle[psfig,...]{...}

Engrafo will flag this as a potential candidate, along with the second .tex file townes_arXiv.tex (which is the real main .tex file). Since this submission contains no .bbl file to help the code clarify which candidate is the main .tex file, the render fails.

I propose that we add a regex to match to a line within the file if it contains \documentclass or \documentstyle but not if those tags are on lines that begin with a comment %. Such a regex might look like (?m)^(?!%)(?:.*\\\\document(?:class|style).*).

0902.1226, another submission that fails to render, has a similar problem where an incorrect candidate is chosen because it contains a \documentclass tag. This tag is not at the beginning of the line. It might be a better criterion to match a \documentclass or \documentstyle tag that begins the line. This would take care of both submissions.

Contributor guide