Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SEO description extracted from documents. #705

Open
buuhuu opened this issue Oct 2, 2024 · 4 comments · May be fixed by #707
Open

Improve SEO description extracted from documents. #705

buuhuu opened this issue Oct 2, 2024 · 4 comments · May be fixed by #707
Labels
enhancement New feature or request

Comments

@buuhuu
Copy link

buuhuu commented Oct 2, 2024

Is your feature request related to a problem? Please describe.

Currently, if not included in the metadata of a page, the description is taken from the first paragraph with more than 10 words and not a link:

/**
* Extracts the description from the document. note, that the selectAll('div > p') used in
* jsdom doesn't work as expected in hast
* @param {Root} hast
* @see https://github.com/syntax-tree/unist/discussions/66
*/
function extractDescription(hast) {
let desc = '';
visit(hast, (node, idx, parent) => {
if (parent?.tagName === 'div' && node.tagName === 'p') {
const words = toString(node).trim().split(/\s+/);
if (words.length >= 10 || words.some((w) => w.length > 25 && !w.startsWith('http'))) {
desc = `${words.slice(0, 25).join(' ')}${words.length > 25 ? ' ...' : ''}`;
return EXIT;
}
}
return CONTINUE;
});
return desc;
}

That condition is not precise enough, as it for example

  • includes domain relative links (starting with /)
  • or does not consider concatenated content of multiple paragraphs

Describe the solution you'd like

I would propose to consider multiple paragraphs if the 10 words criteria is not meet and include headings. For example as page starting with

## Pronađite raspoložive BMW automobile sa lagera.

Odaberite onaj koji najbolje odgovara vašim potrebama.

+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| Stock Locator Model Overview                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| ## Pogledajte detalje                                                                                                                                 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| [/content/dam/metafox/rs/sr/disclaimer-pool/stocklocator/stocklocator-info-icon](/assets/rs/sr/disclaimer-pool/stocklocator/stocklocator-info-icon)   |
|                                                                                                                                                       |
| ### {count} od {count} vozila                                                                                                                         |
|                                                                                                                                                       |
| [/content/dam/metafox/rs/sr/disclaimer-pool/stocklocator/stocklocator-disclaimer](/assets/rs/sr/disclaimer-pool/stocklocator/stocklocator-disclaimer) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+
| NEMA PRONAĐENIH VOZILA                                                                                                                                |
|                                                                                                                                                       |
| Nažalost, nisu pronađena vozila koja odgovaraju Vašim kriterijumima. Molimo Vas da resetujete filtere i napravite drugačiji izbor.                    |
+-------------------------------------------------------------------------------------------------------------------------------------------------------+

Should have the description Pronađite raspoložive BMW automobile sa lagera. Odaberite onaj koji najbolje odgovara vašim potrebama.

In any case a SEO description should probably start with a alpha-numeric character.

Describe alternatives you've considered
Updating the descriptions, but that requires the content team and takes time.

Additional context
slack conversation ff

@buuhuu buuhuu added the enhancement New feature or request label Oct 2, 2024
@tripodsan
Copy link
Contributor

Updating the descriptions, but that requires the content team and takes time.

I think most of the customers are very conscious about SEO and always provide a tailored description via the page metadata. it's probably rarely the case, that the 1st paragraph is a good description. IMO we should remove that feature....

tripodsan added a commit that referenced this issue Oct 3, 2024
@tripodsan tripodsan linked a pull request Oct 3, 2024 that will close this issue
@davidnuescheler
Copy link

i agree with @tripodsan that a lot of customers set descriptions explicitly, and the automatic description (alongside with automatic og:image) is often problematic. i personally, don't like the heuristic approach we use here, and think that this is something that we could possibly get to a more declarative approach with templated metadata.

i would argue that especially in BYOM the automation of description can be more project specific and intelligent easily without a possible regression risk for existing sites.

on a tangent, why do we support URLs that start with a / in the first place, seems counter to https://www.aem.live/docs/davidsmodel#rule-4-fully-qualified-urls-only ? especially as there is no way to produce those word or gdoc. maybe we should possibly look into limiting things a little bit more tightly to make sure that BYOM content can easily edited in all authoring environments, to allow users to transition between and mix authoring environments.

@tripodsan
Copy link
Contributor

@buuhuu can you change the links to absolute url?

@buuhuu
Copy link
Author

buuhuu commented Oct 10, 2024

Not easily. The authors select content using a picker and don't paste URLs as they would in Word authoring. We consider that fair use of the capabilities that AEM as a CMS offers and are reluctant to change the authoring experience. And before we get back into a discussion if we should only allow them to author absolute links - there are use cases where they author references that are not previewed/published like launches.

Making the links absolute programatically isn't straight forward either, as at the time some content is published we may not know the host yet, and changing it requires to republish everything.

Having said that it is not about the links actually, but the link text. This is poor implementation by the partner. They should have authored a link text for these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants